Print Page - AfxStr - Unicode String Functions

Title: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 01:35:30 AM

I have translated some of Paul's FF_ string functions to unicode using the technique of temporary types. I also have added a namespace (Afx.CBStrClass) to the CBSTR class. More to come in the next days.

Code Select


' ########################################################################################
' Microsoft Windows
' File: AfxStr.inc
' Contents: String wrapper functions.
' Compiler: FreeBasic 32 & 64-bit, Unicode.
' Copyright (c) 2016 Jose Roca. Freeware. Use at your own risk.
' THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
' EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF
' MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.
' ########################################################################################

#pragma once
#include once "windows.bi"
#include once "win/ole2.bi"
#include once "Afx/CbStr.inc"
USING Afx.CBStrClass

' ========================================================================================
' Returns a copy of a string with characters or strings removed.
' If cbMatchStr is not present in cbMainStr, all of cbMainStr is returned intact.
' This function is case-sensitive.
' Usage example:
' DIM cbs AS CBSTR = AfxRemove("[]Hello[]", "[]")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxRemove (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   DO
      DIM nPos AS LONG = INSTR(cbOutStr, cbMatchStr)
      IF nPos = 0 THEN EXIT DO
      cbOutStr = LEFT(**cbOutStr, nPos - 1) & MID(**cbOutStr, nPos + LEN(cbMatchStr))
   LOOP
   FUNCTION = cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a copy of a string with characters or strings removed.
' If cbMatchStr is not present in cbMainStr, all of cbMainStr is returned intact.
' cbMatchStr specifies a list of single characters to be searched for individually, 
' a match on any one of which will cause that character to be removed from the result.
' This function is case-sensitive. 
' Usage example:
' Removing all "b", "a", and "c"
' DIM cbs AS CBSTR = AfxRemoveAny("abacadabra", "bac")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxRemoveAny (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   DIM nLen AS LONG =  LEN(cbMatchStr)
   DIM i AS LONG
   FOR i = 1 TO nLen
      cbOutStr = AfxRemove(cbOutStr, MID(**cbMatchStr, i, 1))
   NEXT
   FUNCTION = cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Within a specified string, replace all occurrences of one string with another string.
' Replaces all occurrences of cbMatchStr in cbMainStr with cbReplaceWith  
' The replacement can cause cbMainStr to grow or condense in size. 
' When a match is found, the scan for the next match begins at the position immediately
' following the prior match.
' This function is case-sensitive. 
' Usage example:
' DIM cbs AS CBSTR = AfxReplace("Hello World", "World", "Earth")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxReplace (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR, BYREF cbReplaceWith AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   DIM nPos AS LONG = 1
   DO
      nPos = INSTR(nPos, cbOutStr, cbMatchStr)
      IF nPos = 0 THEN EXIT DO
      cbOutStr = LEFT(**cbOutStr, nPos - 1) & **cbReplaceWith & MID(**cbOutStr, nPos + LEN(cbMatchStr))
      nPos += LEN(cbReplaceWith)
   LOOP
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Within a specified string, replace all occurrences of any of the individual characters
' specified in the cbMatchStr string.
' cbReplaceWith must be a single character. This function does not replace words therefore
' cbMainStr will be the same size - it will not shrink or grow.
' This function is case-sensitive.      
' ========================================================================================
PRIVATE FUNCTION AfxReplaceAny (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR, BYREF cbReplaceWith AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   IF LEN(cbMainStr) = 0 THEN RETURN cbOutStr
   IF LEN(cbMatchStr) = 0 THEN RETURN cbOutStr
   IF LEN(cbReplaceWith) <> 1 THEN RETURN cbOutStr
   DIM i AS LONG, x AS LONG
   FOR x = 1 TO LEN(cbMatchStr)
      FOR i = 1 TO LEN(cbMainStr)
         IF MID(**cbMatchStr, x, 1) = MID(**cbMainStr, i, 1) THEN
            MID(**cbOutStr, i, 1) = **cbReplaceWith
         END IF
      NEXT
   NEXT
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
'  Count the number of occurrences of specified characters strings within a string.
'  cbMainStr is the string expression in which to count characters.  
'  cbMatchStr is a list of single characters to be searched for individually. A match on 
'  any one of which will cause the count to be incremented for each occurrence of that 
'  character. Note that repeated characters in cbMatchStr will not increase the count.
'  This function is case-sensitive. 
' ========================================================================================
PRIVATE FUNCTION AfxTallyAny (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS LONG

   IF LEN(cbMainStr) = 0 OR LEN(cbMatchStr) = 0 THEN EXIT FUNCTION

   ' // Remove possible duplicates in the matches string
   DIM AS LONG i, nPos
   DIM cbMatches AS CBSTR = TYPE<CBSTR>("")
   FOR i = 1 TO LEN(cbMatchStr)
      nPos = INSTR(cbMatches, MID(**cbMatchStr, i, 1))
      IF nPos = 0 THEN cbMatches = **cbMatches & MID(**cbMatchStr, i, 1)
   NEXT

   ' // Do the count
   DIM AS LONG nCount
   FOR i = 1 TO LEN(cbMatchStr)
      nPos = 1
      DO
         nPos = Instr(nPos, cbMainStr, MID(cbMatches, i, 1))
         IF nPos = 0 THEN EXIT DO
         IF nPos THEN
            nCount += 1
            nPos += 1
         END IF
      LOOP
   NEXT
    
   RETURN nCount

END FUNCTION
' ========================================================================================

' ========================================================================================
' Reverse the contents of a string expression.
' Usage example:
' DIM cbs AS CBSTR = AfxStrReverse("garden")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxStrReverse (BYREF cbMainStr AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   IF LEN(cbMainStr) = 0 THEN RETURN cbOutSTr
   DIM AS LONG i, nLen
   nLen = LEN(cbOutStr)
   DIM cbChar AS CBSTR = TYPE<CBSTR>("")
   FOR i = 1 TO nLen \ 2
      cbChar = MID(**cbOutStr, i, 1)
      MID(**cbOutStr, i, 1) = MID(**cbOutStr, nLen - i + 1, 1)
      MID(**cbOutStr, nLen - i + 1, 1) = **cbChar
   NEXT
   RETURN cbOutStr
END FUNCTION
' ========================================================================================


' ========================================================================================
' Complement to the AfxExtract function. Returns the portion of a string following the
' first occurrence of a character or group of characters.
' cbMainStr is searched for the string specified in cbMatchStr If found, all characters
' after cbMatchStr are returned. If cbMatchStr is not present in cbMainStr (or is null) then
' a zero-length empty string is returned.
' nStart is an optional starting position to begin searching. If nStart is not specified,
' position 1 will be used. If nStart is zero, a nul string is returned. If nStart is negative,
' the starting position is counted from right to left: if -1, the search begins at the last
' character; if -2, the second to last, and so forth.
' This function is case-sensitive.
' Usage example:
' DIM cbs AS CBSTR = AfxRemain("Brevity is the soul of wit", "is ")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxRemain (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR, BYVAL nStart AS LONG = 1) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   IF LEN(cbMainStr) = 0 OR LEN(cbMatchStr) = 0 THEN RETURN cbOutStr
   IF nStart = 0 OR nStart > LEN(cbMainStr) THEN RETURN cbOutStr
   IF nStart < 0 THEN nStart = LEN(cbMainStr) + nStart + 1
   DIM nPos AS LONG = INSTR(nStart, **cbMainStr, **cbMatchStr)
   IF nPos = 0 THEN RETURN cbOutStr
   cbOutStr = MID(**cbMainStr, nPos + LEN(cbMatchStr))
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Complement to the AfxExtract function. Returns the portion of a string following the
' first occurrence of a character or group of characters.
' cbMainStr is searched for the string specified in cbMatchStr If found, all characters
' after cbMatchStr are returned. If cbMatchStr is not present in cbMainStr (or is null) then
' a zero-length empty string is returned.
' cbMatchStr specifies a list of single characters to be searched for individually. A match
' on any one of which will cause the extract operation be performed after that character.
' nStart is an optional starting position to begin searching. If nStart is not specified,
' position 1 will be used. If nStart is zero, a nul string is returned. If nStart is negative,
' the starting position is counted from right to left: if -1, the search begins at the last
' character; if -2, the second to last, and so forth.
' This function is case-sensitive.
' Usage example:
' DIM cbs AS CBSTR = AfxRemain("Brevity is the soul of wit", "is ")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxRemainAny (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR, BYVAL nStart AS LONG = 1) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   IF LEN(cbMainStr) = 0 OR LEN(cbMatchStr) = 0 THEN RETURN cbOutStr
   IF nStart = 0 OR nStart > LEN(cbMainStr) THEN RETURN cbOutStr
   IF nStart < 0 THEN nStart = LEN(cbMainStr) + nStart + 1
   DIM i AS LONG, x AS LONG
   FOR i = nStart TO LEN(cbMainStr)
      FOR x = 1 TO LEN(cbMatchStr)
         IF MID(**cbMainStr, i, 1) = MID(**cbMatchStr, x, 1) THEN
            cbOutStr = MID(**cbMainStr, i + 1)
            RETURN cbOutStr
         END IF
      NEXT
   NEXT
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Count the number of occurrences of strings within a string.
' cbMainStr is the string expression in which to count characters.  
' cbMatchStr is the string expression to count all occurrences of. 
' If cbMatchStr is not present in cbMainStr, zero is returned. 
' When a match is found, the scan for the next match begins at the position immediately
' following the prior match.
' This function is case-sensitive. 
' Usage example:
' DIM nCount AS LONG = AfxTally("abacadabra", "ab")
' MessageBoxW 0, WSTR(nCount), "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxTally (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS LONG
   DIM nCount AS LONG
   DIM nPos AS LONG = 1
   DO
      nPos = INSTR(nPos, cbMainStr, cbMatchStr)
      IF nPos = 0 THEN EXIT DO
      nCount += 1
      nPos += LEN(cbMatchStr)
   LOOP
   RETURN nCount
END FUNCTION
' ========================================================================================

' ========================================================================================
' Determine whether each character of a string is present in another string.
' Returns zero if each character in cbMainStr is present in cbMatchStr 
' If not, it returns the position of the first non-matching character in MainString.
' This function is very useful for determining if a string contains only numeric 
' digits, for example.
' This function is case-sensitive.
' If nStart evaluates to a position outside of the string, or if nStart is zero, 
' then the function returns zero.
' ========================================================================================
PRIVATE FUNCTION AfxVerify (BYVAL nStart AS LONG, BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS LONG

   IF VARPTR(nStart) = NULL THEN EXIT FUNCTION
   IF nStart <= 0 OR nStart > LEN(cbMainStr) THEN RETURN 0
   
   ' Get each character in cbMainStr and look for it in cbMatchStr
   DIM AS LONG i, nPos, idx
   For i = nStart To LEN(cbMainStr)
      nPos = Instr(cbMatchStr, MID(cbMainStr, i, 1))
      IF nPos = 0 THEN
         idx = i
         EXIT FOR
      END IF
   NEXT
   FUNCTION =  idx
   
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string containing a left-justified (padded) string.
' If the optional parameter cbPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of cbPadCharacter.
' Usage example:
' DIM cbs AS CBSTR = AfxRSet("FreeBasic", 20, "*")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxLSet (BYREF cbMainStr AS CBSTR, BYVAL nStringLength AS LONG, BYREF cbPadCharacter AS CBSTR = " ") AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   cbOutStr = STRING(nStringLength, ASC(**cbPadCharacter, 1))
   MID(**cbOutStr, 1, LEN(cbMainStr)) = **cbMainStr
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string containing a right-justified (padded) string.
' If the optional parameter cbPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of cbPadCharacter.
' Usage example:
' DIM cbs AS CBSTR = AfxRSet("FreeBasic", 20, "*")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxRSet (BYREF cbMainStr AS CBSTR, BYVAL nStringLength AS LONG, BYREF cbPadCharacter AS CBSTR = " ") AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   cbOutStr = STRING(nStringLength, ASC(**cbPadCharacter, 1))
   MID(**cbOutStr, nStringLength - LEN(cbMainStr) + 1, LEN(cbMainStr)) = **cbMainStr
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string containing a right-justified (padded) string.
' If the optional parameter cbPadCharacter not specified, the function pads the string with
' space characters to the left. Otherwise, the function pads the string with the first
' character of cbPadCharacter.
' Usage example:
' DIM cbs AS CBSTR = AfxCSet("FreeBasic", 20, "*")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxCSet (BYREF cbMainStr AS CBSTR, BYVAL nStringLength AS LONG, BYREF cbPadCharacter AS CBSTR = " ") AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   cbOutStr = STRING(nStringLength, ASC(**cbPadCharacter, 1))
   MID(**cbOutStr, (nStringLength - LEN(cbMainStr)) \ 2 + 1, LEN(cbMainStr)) = **cbMainStr
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
'  Parses a path/file name to extract component parts
'  This function evaluates a text path/file text name, and returns a requested part of the
'  name. The functionality is strictly one of string parsing alone. 
'  cbOption is one of the following words which is used to specify the requested part:
'  PATH
'        Returns the path portion of the path/file Name. That is the text up to and 
'        including the last backslash (\) or colon (:).
'  NAME
'        Return the name portion of the path/file Name. That is the text To the right 
'        of the last backslash (\) or colon (:), ending just before the last period (.).
'  EXTN
'        Returns the extension portion of the path/file name. That is the last 
'        period (.) in the string plus the text to the right of it.
'  NAMEX
'        Returns the name and the EXTN parts combined.
' ========================================================================================
PRIVATE FUNCTION AfxPathName (BYREF cbOption AS CBSTR, BYREF cbFileSpec AS CBSTR) AS CBSTR

   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbFileSpec)
   IF LEN(cbFileSpec) = 0 THEN RETURN cbOutStr
   
   SELECT CASE UCASE(cbOption)

      CASE "PATH"
         DIM nPos AS LONG = InstrRev(cbFileSpec, ANY ":/\")
         IF nPos THEN cbOutStr = LEFT(**cbFileSpec, nPos - 1)
         RETURN cbOutStr

      CASE "NAME"
         ' // Retrieve the full filename
         DIM nPos AS LONG = InstrRev(cbFileSpec, ANY ":/\")
         IF nPos THEN cbOutStr = MID(**cbFileSpec, nPos + 1)
         ' // Retrieve the filename
         nPos = InstrRev(cbOutStr, ".")
         IF nPos THEN cbOutStr = LEFT(**cbOutStr, nPos - 1)
         RETURN cbOutStr

      CASE "NAMEX"
         DIM nPos AS LONG = InStrRev(cbFileSpec, ANY ":/\")
         IF nPos THEN cbOutStr = MID(**cbFileSpec, nPos + 1) ELSE cbOutStr = ""
         RETURN cbOutStr

      CASE "EXTN"
         ' // Retrieve the full filename
         DIM nPos AS LONG = InstrRev(cbFileSpec, ANY ":/\")
         IF nPos THEN cbOutStr = MID(**cbFileSpec, nPos + 1)
         ' // Retrieve the extension
         nPos = InStrRev(cbOutStr, ".")
         IF nPos THEN cbOutStr = MID(**cbOutStr, nPos) ELSE cbOutStr = ""
         RETURN cbOutStr

   END SELECT

END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 01:51:56 AM

Code Select


' ========================================================================================
'  Adds paired characters to the beginning and end of a string.
'  It is particularly useful for enclosing text with parenthesess, quotes, brackets, etc.
'  For example:  AfxWrap("Paul", "<", ">") results in <Paul>
'  If only one wrap character/string is specified then that character or string is used
'  for both sides.
'  For example:  AfxWrap("Paul", "'") results in 'Paul'
'  If no wrap character/string is specified then double quotes are used.
'  For example: AfxWrap("Paul") results in "Paul"
' ========================================================================================
PRIVATE FUNCTION AfxWrap OVERLOAD (BYREF cbMainStr AS CBSTR, BYREF cbLeftChar AS CBSTR, BYREF cbRightChar AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   cbOutStr = **cbLeftChar & **cbMainStr & **cbRightChar
   FUNCTION = cbOutStr
END FUNCTION
' ========================================================================================
' ========================================================================================
PRIVATE FUNCTION AfxWrap OVERLOAD (BYREF cbMainStr AS CBSTR, BYREF cbChar AS CBSTR = CHR(34)) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   cbOutStr = **cbChar & **cbMainStr & **cbChar
   FUNCTION = cbOutStr
END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 02:08:37 AM

Code Select


' ========================================================================================
' Returns the count of delimited fields from a string expression.
' If cbMainStr is empty (a null string) or contains no delimiter character(s), the string 
' is considered to contain exactly one sub-field. In this case, AfxParseCount returns the value 1.
' Delimiter contains a string (one or more characters) that must be fully matched.
' Delimiters are case-sensitive.
' Usage example:
' DIM nCount AS LONG = AfxParseCount("one,two,three", ",")
' MessageBoxW 0, WSTR(nCount), "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxParseCount (BYREF cbMainStr AS CBSTR, BYREF cbDelimiter AS CBSTR = ",") AS LONG
   DIM nCount AS LONG = 1
   DIM nPos AS LONG = 1
   DO
      nPos = Instr(nPos, cbMainStr, cbDelimiter)
      IF nPos = 0 THEN EXIT DO
      nCount += 1
      nPos += LEN(cbDelimiter)
   LOOP
   RETURN nCount
END FUNCTION
' ========================================================================================

' ========================================================================================
' Return the count of delimited fields from a string expression.
' If MainString is empty (a null string) or contains no delimiter character(s), the string 
' is considered to contain exactly one sub-field. In this case, AfxParseCountAny returns the value 1.
' Delimiter contains a set of characters (one or more), any of which may act as a delimiter character.
' Delimiters are case-sensitive.
' Usage example:
' DIM nCount AS LONG = AfxParseCountAny("1;2,3", ",;")
' MessageBoxW 0, WSTR(nCount), "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxParseCountAny (BYREF cbMainStr AS CBSTR, BYREF cbDelimiter AS CBSTR) AS LONG
   DIM nCount AS LONG = 1
   DIM nPos AS LONG = 1
   DO
      nPos = Instr(nPos, cbMainStr, ANY cbDelimiter)
      IF nPos = 0 THEN EXIT DO
      nCount += 1
      nPos += LEN(cbDelimiter)
   LOOP
   RETURN nCount
END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 02:19:18 AM

Code Select


' ========================================================================================
' Return a string consisting of multiple copies of the specified string.
' This function is very similar to STRING (which makes multiple copies of a single character).
' Usage example:
' DIM cbs AS CBSTR = AfxRepeat(5, "Paul")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxRepeat (BYVAL nCount AS LONG, BYREF cbMainStr AS CBSTR) AS CBSTR

   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   IF nCount <= 0 THEN RETURN cbOutStr
    
   ' Create the final full buffer and insert the
   ' strings into it in order to avoid nCount concatenations.
   DIM nLen AS LONG = LEN(cbMainStr)
   cbOutStr = SPACE(nCount * nLen)
   
   DIM i AS LONG
   FOR i = 0 TO nCount - 1
      MID(**cbOutStr, (i * nLen) + 1, nLen) = **cbMainStr
   NEXT
    
   RETURN cbOutStr

END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 02:33:54 AM

Code Select


' ========================================================================================
' Returns a string with nCount characters removed from the left side of the string.
' If nCount is less than one then the entire string is returned. 
' ========================================================================================
PRIVATE FUNCTION AfxClipLeft (BYREF cbMainStr AS CBSTR, BYVAL nCount AS LONG) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   IF nCount <= 0 THEN RETURN cbOutStr
   DIM nLen AS LONG = LEN(cbMainStr)
   nCount = IIF(nLen < nCount, nLen, nCount)
   cbOutStr = MID(**cbMainStr, nCount + 1)
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string with nCount characters removed from the right side of the string.
' If nCount is less than one then the entire string is returned. 
' ========================================================================================
PRIVATE FUNCTION AfxClipRight (BYREF cbMainStr AS CBSTR, BYVAL nCount AS LONG) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   IF nCount <= 0 THEN RETURN cbOutStr
   DIM nLen AS LONG = LEN(cbMainStr)
   nCount = nLen - nCount
   nCount = IIF(nLen < nCount, nLen, nCount)
   cbOutStr = LEFT(**cbMainStr, nCount)
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string with nCount characters removed starting at position nStart. The first
' character is considered position 1, the second position 2, etc...
' If nCount or nStart is less than one then the entire string is returned. 
' ========================================================================================
PRIVATE FUNCTION AfxClipMid (BYREF cbMainStr AS CBSTR, BYVAL nStart AS LONG, BYVAL nCount AS LONG) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   IF (nCount <= 0) OR (nStart <= 0) THEN RETURN cbOutStr
   DIM nLen AS LONG = LEN(cbMainStr)
   cbOutStr = LEFT(**cbMainStr, nStart - 1) & MID(**cbMainSTr, nStart + nCount)
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 02:54:40 AM

Code Select


' ========================================================================================
' Remove paired characters to the beginning and end of a string.
' It is particularly useful for removing text with parenthesess, quotes, brackets, etc.
' For example: AfxUnWrap("<Paul>", "<", ">") results in Paul
' If only one unwrap character/string is specified then that character or string is used for both sides.
' For example: AfxUnWrap("'Paul'", "'") results in Paul
' If no wrap character/string is specified then double quotes are used.
' For example: AfxUnWrap("""Paul""") results in Paul
' ========================================================================================
PRIVATE FUNCTION AfxUnWrap OVERLOAD (BYREF cbMainStr AS CBSTR, BYREF cbLeftChar AS CBSTR, BYREF cbRightChar AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   cbOutStr = LTRIM(**cbMainStr, **cbLeftChar)
   cbOutStr = RTRIM(**cbOutStr, **cbRightChar)
   FUNCTION = cbOutStr
END FUNCTION
' ========================================================================================
' ========================================================================================
PRIVATE FUNCTION AfxUnWrap OVERLOAD (BYREF cbMainStr AS CBSTR, BYREF cbChar AS CBSTR = CHR(34)) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   cbOutStr = LTRIM(**cbMainStr, **cbChar)
   cbOutStr = RTRIM(**cbOutStr, **cbChar)
   FUNCTION = cbOutStr
END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 03:29:00 AM

Code Select


' ========================================================================================
' Delete a specified number of characters from a string expression.
' Returns a string based on MainString but with nCount characters deleted 
' starting at position nStart. The first character in the string is position 1, etc.
' ========================================================================================
PRIVATE FUNCTION AfxDelete (BYREF cbMainStr AS CBSTR, BYVAL nStart AS LONG, BYVAL nCount AS LONG) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   DIM nLen AS LONG = LEN(cbMainStr)
   IF nLen = 0 OR nStart < 0 OR nCount <= 0 OR nStart > nLen THEN RETURN cbOutStr
   cbOutStr = LEFT(**cbMainStr, nStart) & MID(**cbMainStr, nStart + 1 + nCount, nCount)
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 03:40:46 AM

Code Select


' ========================================================================================
' Inserts a string at a specified position within another string expression.
' Returns a string consisting of cbMainStr with the string cbInsertString inserted 
' at nPosition. If nPosition is greater than the length of cbMainStr or <= zero then 
' cbInsertString is appended to cbMainStr The first character in the string is position 1, etc.
' ========================================================================================
PRIVATE FUNCTION AfxStrInsert (BYREF cbMainStr AS CBSTR, BYREF cbInsertString AS CBSTR, BYVAL nPosition AS LONG) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   IF nPosition > LEN(cbMainStr) OR nPosition <= 0 THEN
      cbOutStr = **cbOutStr & **cbInsertString
   ELSE
      cbOutStr = LEFT(**cbMainStr, nPosition) & **cbInsertString & MID(**cbMainStr, nPosition)
   END IF
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 05:36:23 PM

Code Select


' ========================================================================================
' Extract characters from a string up to a character or group of characters.
' Complement function to AfxRemain.
' Returns a substring of cbMainStr starting with its first character (or the character
' specified by nStart) and up to (but not including) the first occurrence of cbMatchStr
' If cbMatchStr is not present in cbMainStr (or is null) then all of MainString is
' returned from the nStart position.
' This function is case-sensitive.
' The following line returns "aba" (match on "cad")
' DIM cbs AS CBSTR = AfxExtract(1, "abacadabra","cad")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxExtract (BYVAL nStart AS LONG, BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   DIM nLen AS LONG = LEN(cbMainStr)
   IF (nStart = 0) OR (nStart > nLen) THEN
      cbOutStr = ""
      RETURN cbOutStr
   END IF
   IF nStart < 0 THEN nStart = nLen + nStart + 1
   DIM nPos AS LONG = Instr(nStart, cbMainStr, cbMatchStr)
   IF nPos THEN
      cbOutStr = MID(cbMainStr, nStart, nPos - nStart )
   ELSE
      cbOutStr = MID(cbMainStr, nStart)
   END IF
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Extract characters from a string up to a specific character.
' Returns a substring of cbMainStr starting with its first character (or the character
' specified by nStart) and up to (but not including) the first occurrence of cbMatchStr
' cbMatchStr specifies a list of single characters to be searched for individually, a
' match on any one of which will cause the extract operation to be performed up to that character.
' If cbMatchStr is not present in MainString (or is null) then all of cbMainStr is returned.
' This function is case-sensitive.
' Usage example:
' DIM cbs AS CBSTR = AfxExtractAny(1, "abacadabra","cd")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxExtractAny (BYVAL nStart AS LONG, BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>(cbMainStr)
   DIM nLen AS LONG = LEN(cbMainStr)
   IF (nStart = 0) OR (nStart > nLen) THEN
      cbOutStr = ""
      RETURN cbOutStr
   END IF
   IF nStart < 0 THEN nStart = nLen + nStart + 1
   DIM AS LONG i, x
   FOR i = nStart TO LEN(cbMainStr)
      FOR x = 1 TO LEN(cbMatchStr)
         IF MID(cbMainStr, i, 1) = MID(cbMatchStr, x, 1) THEN
            cbOutStr = MID(cbMainStr, nStart, i - nStart)
            RETURN cbOutStr
         END IF
      NEXT
   NEXT
END FUNCTION
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 07:05:34 PM

Code Select


' ========================================================================================
' Returns a string containing only the characters contained in a specified match string.
' All other characters are removed. If cbMatchStr is an empty string the function returns
' an empty string.
' This function is case-sensitive.
' Usage example:
' DIM cbs AS CBSTR = AfxRetain("abacadabra","b")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxRetain (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   IF LEN(cbMainStr) = 0 OR LEN(cbMatchStr) = 0 THEN RETURN cbOutStr
   DIM nPos AS LONG = 1
   DO
      nPos = Instr(nPos, cbMainStr, cbMatchStr)
      IF nPos = 0 THEN EXIT DO
      cbOutStr = **cbOutStr & MID(**cbMainStr, nPos, LEN(cbMatchStr))
      nPos += LEN(cbMatchStr)
   LOOP
   RETURN cbOutStr
END FUNCTION
' ========================================================================================

' ========================================================================================
' Returns a string containing only the characters contained in a specified match string.
' All other characters are removed.
' If cbMatchStr is an empty string the function returns an empty string.
' cbMatchStr specifies a list of single characters to be searched for individually.
' A match on any one of which will cause that character to be removed from the result.
' This function is case-sensitive.
' Usage example:
' DIM cbs AS CBSTR = AfxRetainAny("<p>1234567890<ak;lk;l>1234567890</p>", "<;/p>")
' MessageBoxW 0, cbs, "", MB_OK
' ========================================================================================
PRIVATE FUNCTION AfxRetainAny (BYREF cbMainStr AS CBSTR, BYREF cbMatchStr AS CBSTR) AS CBSTR
   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   IF LEN(cbMainStr) = 0 OR LEN(cbMatchStr) = 0 THEN RETURN cbOutStr
   DIM AS LONG i, x, nPos
   FOR i = 1 TO LEN(cbMainStr)
      nPos = Instr(cbMatchStr, MID(cbMainStr, i, 1))
      IF nPos THEN cbOutStr = **cbOutStr & MID(cbMainStr, i, 1)
   NEXT
   RETURN cbOutStr
END FUNCTION
'==============================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 08:17:14 PM

Code Select


'==============================================================================
' Shrink a string to use a consistent single character delimiter.
' The purpose of this function is to create a string with consecutive data
' items (words) separated by a consistent single character. This makes it very
' straightforward to parse the results as needed.
' If cbMask is not defined then all leading spaces and trailing spaces are
' removed entirely. All occurrences of two or more spaces are changed to a
' single space. Therefore, the new string returned consists of zer' or more
' words, each separated by a single space character.
' If cbMask is specified, it defines one or more delimiter characters to shrink.
' All leading and trailing mask characters are removed entirely.
' All occurrences of one or more mask characters are replaced with the first
' character of cbMask. The new string returned consists of zero or more words,
' each separated by the character found in the first position of cbMask.
' WhiteSpace is generally defined as the four common non-printing characters:
' Space, Tab, Carriage-Return, and Line-Feed. cbMask = Chr(32,9,13,10)
' Usage example:
' DIM cbs AS CBSTR = AfxShrink(",,, one , two     three, four,", " ,")
' MessageBoxW 0, cbs, "", MB_OK
'==============================================================================
PRIVATE FUNCTION AfxShrink (BYREF cbMainStr AS CBSTR, BYREF cbMask AS CBSTR = " ") AS CBSTR

   DIM cbOutStr AS CBSTR = TYPE<CBSTR>("")
   IF LEN(cbMainStr) = 0 OR LEN(cbMask) = 0 THEN RETURN cbOutStr

   ' Eliminate all leading and trailing cbMask characters
   cbOutStr = TRIM(**cbMainStr, ANY **cbMask)

   ' Eliminate all duplicate sMask characters within the string
   DIM cbReplace AS CBSTR = TYPE<CBSTR>(LEFT(**cbMask, 1))
   DIM cbDuplicate AS CBSTR = TYPE<CBSTR>("")
   DIM nMaskLen AS LONG = LEN(cbMask)
   DIM AS LONG i, nPos
   FOR i = 1 TO nMaskLen
      cbDuplicate = MID(**cbMask, i, 1) & MID(**cbMask, i, 1)   ' usually double spaces
      nPos = 1
      DO
         nPos = Instr(cbOutStr, cbDuplicate)
         IF nPos = 0 THEN EXIT DO
         cbOutStr = Left(**cbOutStr, nPos - 1) & **cbReplace & MID(**cbOutStr, nPos + LEN(cbDuplicate))
      LOOP
   NEXT

   ' Replace all single characters in the mask with the first character of the mask.
   nPos = 1
   DO
      nPos = Instr(nPos, cbOutStr, ANY cbMask)
      IF nPos = 0 THEN EXIT DO
      ' Only do the replace if the character at the position found is
      ' different than the character we need to replace it with. This saves
      ' us from having to do an unneeded string concatenation.
      IF MID(**cbOutStr, nPos, 1) <> **cbReplace  THEN
         cbOutStr = LEFT(**cbOutStr, nPos - 1) & **cbReplace & MID(**cbOutStr, nPos + 1)
      END IF
      nPos += 1
   LOOP

   ' Finally, do a pass to ensure that there are no duplicates of the
   ' first mask character because of the replacements in the step above.
   cbDuplicate = LEFT(**cbMask, 1) & LEFT(**cbMask, 1)
   nPos = 1
   DO
      nPos = Instr(cbOutStr, cbDuplicate)
      IF nPos = 0 THEN EXIT DO
      cbOutStr = LEFT(**cbOutStr, nPos - 1) & **cbReplace & MID(**cbOutStr, nPos + LEN(cbDuplicate))
   LOOP

   RETURN cbOutStr

END FUNCTION
'==============================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 08:32:39 PM

I anticipate that these functions will be too slow with big strings. Not only the overhead of creating new strings, but also the overhead of creating new instances of the class. Probably the way would be to use buffers and pointers for the string manipulation and assign the final result to a CBSTR.

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 07, 2016, 08:34:43 PM

Excellent Jose!

Two questions:

(1) Are you comfortable that CBStr can now be used as a complete replacement for WSTRING in our FB programs? If you are, then I will start changing my code in the editor to use this new approach.

(2) Just a thought about the naming convention for two functions you posted above. I think that for consistency, AfxStrInsert should be renamed AfxInsert, and AfxStrReverse should be called AfxReverse. Otherwise, you should probably add the "Str" tag to AfxDelete, AfxExtract, etc...

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 07, 2016, 08:35:48 PM

Quote from: Jose Roca on July 07, 2016, 08:32:39 PM
I anticipate that these functions will be too slow with big strings. Not only the overhead of creating new strings, but also the overhead of creating new instances of the class. Probably the way would be to use buffers and pointers for the string manipulation and assign the final result to a CBSTR.

Sounds like the start of an idea for a new class.... CBStrBuilder. :)

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 09:42:56 PM

Quote
(1) Are you comfortable that CBStr can now be used as a complete replacement for WSTRING in our FB programs? If you are, then I will start changing my code in the editor to use this new approach.

I'm concerned about speed. The compiler and the class do a good work creating and destroying instances of the class transparently. If you have parameters as CBSTR, an instance of the class is created for each one and you can pass strings, wstrings or cbstrings. If the result of the function is a CBSTR, another instance of the class is created and the result copied to it. As if you were working with native dynamic unicode strings but with more overhead.

I would like to rewrite them using wstrings and/or dynamically allocated buffers to do the string manipulation and assign the result to a CBSTR before returning it. This would increase the speed and reduce the overhead. That is, to use CBSTR only when needed to solve the inconvenience of not being able to use the result of a function to pass it directly to another function without having to use intermediate steps to free the memory. Also, in COM programming, to use it for BSTR out parameters.

I would use it as we do with variants: only when needed.

Quote
(2) Just a thought about the naming convention for two functions you posted above. I think that for consistency, AfxStrInsert should be renamed AfxInsert, and AfxStrReverse should be called AfxReverse. Otherwise, you should probably add the "Str" tag to AfxDelete, AfxExtract, etc...

I think that I will use AfxStr, and also add an W to allow for A functions.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 10:21:28 PM

A demonstration of what I mean.

Currently, to implement the function below we have to declare an out WSTRING parameter and dim a WSTRING of enough size and pass it. The problem is that WSTRINGs are of fixed size, so if the WSTRING has not enough size we have a problem, and if we dimension them with big sizes, we waste memory.

We can also declare the return type of the function as WSTRING PTR and return a pointer to a dynamically allocated buffer. The problem is that we can't pass this pointer directly to another function without creating a memory leak. We have to assign the pointer to a variable, pass the variable and free it.

Using CBSTR to return the result...

Code Select


' ========================================================================================
' Gets the text of a window.
' Note: GetWindowText cannot retrieve the text of a control in another application.
' ========================================================================================
FUNCTION AfxGetWindowTextW (BYVAL hwnd AS HWND) AS CBSTR
   DIM nLen AS LONG, pbuffer AS WSTRING PTR
   nLen = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   pbuffer = CAllocate(nLen + 1, 2)
   nLen = SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, pbuffer))
   DIM cbText AS CBSTR = TYPE(CBSTR(LEFT(*pbuffer, nLen)))
   Deallocate pbuffer
   FUNCTION = cbText
END FUNCTION
' ========================================================================================

we can use

Code Select


MessageBoxW hwndMain, AfxGetWindowTextW(hwndMain), "", MB_OK

And we don't have to worry about freeing the memory.

(I have added an operator to allow to pass the handle of the BSTR hosted in the CBSTR class without having to use *).

Code Select


' ========================================================================================
OPERATOR CBStr.CAST () AS ANY PTR
   OPERATOR =  CAST(ANY PTR, m_bstr)
END OPERATOR
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 10:33:42 PM

Instead of allocating a buffer with CAllocate, we can also use:

Code Select


' ========================================================================================
FUNCTION AfxGetWindowTextW (BYVAL hwnd AS HWND) AS CBSTR
   DIM nLen AS LONG = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   DIM cbText AS CBSTR = TYPE<CBSTR>(SPACE(nLen + 1))
   nLen = SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, *cbText))
   FUNCTION = LEFT(**cbText, nLen)
END FUNCTION
' ========================================================================================

In this small function, it doesn't matter if we use one method or another. The problem is when we have to do many concatenations.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 07, 2016, 10:43:04 PM

Also, as in FB a BSTR is defined as a pointer to a WSTRING, we can't have an overloaded method for WSTRINGs and another for BSTRs, so there is a problem in the LET operator to know if it is a pointer to one or the another. I have needed to use a trick.

Code Select


' ========================================================================================
OPERATOR CBStr.Let (BYREF bstrHandle AS AFX_BSTR)
   IF bstrHandle = NULL THEN EXIT OPERATOR
   ' Free the current OLE string
   IF m_bstr THEN SysFreeString(m_bstr)
   ' Detect if the passed handle is an OLE string
   ' If it is an OLE string it must have a descriptor; otherwise, don't
   ' Get the length looking at the descriptor
   DIM res AS DWORD = PEEK(DWORD, CAST(ANY PTR, bstrHandle) - 4) \ 2
   ' If the retrieved length is the same that the returned by LEN, then it must be an OLE string
   IF res = .LEN(*bstrHandle) THEN
      ' Attach the passed handle to the class
      m_bstr = bstrHandle
   ELSE
      ' Allocate an OLE string with the contents of the string pointed by bstrHandle
      m_bstr = SysAllocString(*bstrHandle)
   END IF
END OPERATOR
' ========================================================================================

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 07, 2016, 10:50:11 PM

Quote from: Jose Roca on July 07, 2016, 10:33:42 PM
Instead of allocating a buffer with CAllocate, we can also use:

Code Select Expand
' ======================================================================================== FUNCTION AfxGetWindowTextW (BYVAL hwnd AS HWND) AS CBSTR DIM nLen AS LONG, pbuffer AS WSTRING PTR nLen = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0) DIM cbText AS CBSTR = TYPE<CBSTR>(SPACE(nLen + 1)) nLen = SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, *cbText)) FUNCTION = LEFT(**cbText, nLen) END FUNCTION ' ========================================================================================

In this small function, it doesn't matter if we use one method or another.

I like this approach a bit better.

Quote
The problem is when we have to do many concatenations.

I think we should do some tests to determine if there is actually a huge speed problem or not. It may be insignificant or immaterial.

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 07, 2016, 11:46:30 PM

Quote from: TechSupport on July 07, 2016, 10:50:11 PM
I think we should do some tests to determine if there is actually a huge speed problem or not. It may be insignificant or immaterial.

I am going to do these tests first thing in the morning.

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 08, 2016, 12:12:31 AM

My initial tests show that it gets exponentially slower as the number concatenations increases. A thousand or two is not bad but once enter 10,000 it starts to increase quickly by orders of magnitude. More tests in the morning.

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 08, 2016, 12:29:41 AM

Hi Jose,

I am off to bed now but I wanted to post this article that I just started to read. The gist seems to be that SysReAllocStringLen is so much faster than creating a new string and copying but parts into it for an append.
http://technolog.nl/blogs/eprogrammer/archive/2006/07/25/Boost-BSTR-performance-for-free_2C00_-by-3000_2500_.aspx

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:08:09 AM

Don't know if SysReallocStringLen will make a big difference. SysReallocString certainly not.

Source code of SysReallocString:

Code Select


 /******************************************************************************
*      SysReAllocString    [OLEAUT32.3]
*
* Change the length of a previously created BSTR.
*
* PARAMS
*  old [I/O] BSTR to change the length of
*  str [I]   New source for pbstr
*
* RETURNS
*  Success: 1
*  Failure: 0.
*
* NOTES
*  See BSTR(), SysAllocStringStringLen().
*/
INT WINAPI SysReAllocString(LPBSTR old,LPCOLESTR str)
{
   /*
   * Sanity check
   */
   if (old==NULL)
   return 0;

   /*
   * Make sure we free the old string.
   */
   SysFreeString(*old);

   /*
   * Allocate the new string
   */
   *old = SysAllocString(str);

   return 1;
}

SysReallocString does the same that I'm doing: freeing the old string and allocating a new one.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:10:03 AM

SysReallocStringLen is different:

Code Select


/******************************************************************************
 *             SysReAllocStringLen   [OLEAUT32.5]
 *
 * Change the length of a previously created BSTR.
 *
 * PARAMS
 *  old [O] BSTR to change the length of
 *  str [I] New source for pbstr
 *  len [I] Length of oleStr in wide characters
 *
 * RETURNS
 *  Success: 1. The size of pbstr is updated.
 *  Failure: 0, if len >= 0x80000000 or memory allocation fails.
 *
 * NOTES
 *  See BSTR(), SysAllocStringByteLen().
 *  *old may be changed by this function.
 */
int WINAPI SysReAllocStringLen(BSTR* old, const OLECHAR* str, unsigned int len)
{
   /* Detect integer overflow. */
   if (len >= ((UINT_MAX-sizeof(WCHAR)-sizeof(DWORD))/sizeof(WCHAR)))
   return FALSE;
 
   if (*old!=NULL) {
      DWORD newbytelen = len*sizeof(WCHAR);
      bstr_t *old_bstr = bstr_from_str(*old);
      bstr_t *bstr = CoTaskMemRealloc(old_bstr, bstr_alloc_size(newbytelen));
 
      if (!bstr) return FALSE;
 
      *old = bstr->u.str;
      bstr->size = newbytelen;
      /* The old string data is still there when str is NULL */
      if (str && old_bstr->u.str != str) memmove(bstr->u.str, str, newbytelen);
      bstr->u.str[len] = 0;
   } else {
      *old = SysAllocStringLen(str, len);
   }
 
   return TRUE;
}

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:14:35 AM

The BSTR cache can be disabled calling SetOaNoCache.

See: https://msdn.microsoft.com/en-us/library/windows/desktop/ms644360(v=vs.85).aspx

The cache has been the cause of some posts in the PB Forum saying that PB WSTRINGs leaked because they didn't see a reduction of the memory consumption in the Task Manager after the variable went out of scope.

In fact, after calling SysFreeString, you can still access the BSTR if the cache is not disabled, e.g.

Code Select


DIM bs AS BSTR
bs = SysAllocString("pepe")
MessageBoxW 0, bs, "1", MB_OK
SysFreeString bs
MessageBoxW 0, bs, "2", MB_OK

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:34:57 AM

Source code of SysFreeString (Wine version):

Code Select


/******************************************************************************
 *      SysFreeString   [OLEAUT32.6]
 *
 * Free a BSTR.
 *
 * PARAMS
 *  str [I] BSTR to free.
 *
 * RETURNS
 *  Nothing.
 *
 * NOTES
 *  See BSTR.
 *  str may be NULL, in which case this function does nothing.
 */
void WINAPI SysFreeString(BSTR str)
 {
     bstr_cache_entry_t *cache_entry;
     bstr_t *bstr;
     IMalloc *malloc = get_malloc();
     SIZE_T alloc_size;
 
     if(!str)
         return;
 
     bstr = bstr_from_str(str);
 
     alloc_size = IMalloc_GetSize(malloc, bstr);
     if (alloc_size == ~0UL)
         return;
 
     cache_entry = get_cache_entry_from_alloc_size(alloc_size);
     if(cache_entry) {
         unsigned i;
 
         EnterCriticalSection(&cs_bstr_cache);
 
         /* According to tests, freeing a string that's already in cache doesn't corrupt anything.
          * For that to work we need to search the cache. */
         for(i=0; i < cache_entry->cnt; i++) {
             if(cache_entry->buf[(cache_entry->head+i) % BUCKET_BUFFER_SIZE] == bstr) {
                 WARN_(heap)("String already is in cache!\n");
                 LeaveCriticalSection(&cs_bstr_cache);
                 return;
             }
         }
 
        if(cache_entry->cnt < sizeof(cache_entry->buf)/sizeof(*cache_entry->buf)) {
            cache_entry->buf[(cache_entry->head+cache_entry->cnt) % BUCKET_BUFFER_SIZE] = bstr;
            cache_entry->cnt++;

            if(WARN_ON(heap)) {
                unsigned n = (alloc_size-FIELD_OFFSET(bstr_t, u.ptr))/sizeof(DWORD);
                for(i=0; i<n; i++)
                    bstr->u.dwptr[i] = ARENA_FREE_FILLER;
            }

            LeaveCriticalSection(&cs_bstr_cache);
            return;
        }

        LeaveCriticalSection(&cs_bstr_cache);
    }

    CoTaskMemFree(bstr);
}

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:39:14 AM

SysAllocString simply delegates it to SysAllocStringLen:

Code Select


BSTR WINAPI SysAllocString(LPCOLESTR str)
{
    if (!str) return 0;

    /* Delegate this to the SysAllocStringLen method. */
    return SysAllocStringLen(str, lstrlenW(str));
}

And this is the source code for SysAllocStringLen:

Code Select


/******************************************************************************
 *             SysAllocStringLen     [OLEAUT32.4]
 *
 * Create a BSTR from an OLESTR of a given wide character length.
 *
 * PARAMS
 *  str [I] Source to create BSTR from
 *  len [I] Length of oleStr in wide characters
 *
 * RETURNS
 *  Success: A newly allocated BSTR from SysAllocStringByteLen()
 *  Failure: NULL, if len is >= 0x80000000, or memory allocation fails.
 *
 * NOTES
 *  See BSTR(), SysAllocStringByteLen().
 */
BSTR WINAPI SysAllocStringLen(const OLECHAR *str, unsigned int len)
{
    bstr_t *bstr;
    DWORD size;

    /* Detect integer overflow. */
    if (len >= ((UINT_MAX-sizeof(WCHAR)-sizeof(DWORD))/sizeof(WCHAR)))
    return NULL;

    TRACE("%s\n", debugstr_wn(str, len));

    size = len*sizeof(WCHAR);
    bstr = alloc_bstr(size);
    if(!bstr)
        return NULL;

    if(str) {
        memcpy(bstr->u.str, str, size);
        bstr->u.str[len] = 0;
    }else {
        memset(bstr->u.str, 0, size+sizeof(WCHAR));
    }

    return bstr->u.str;
}

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:45:19 AM

And this is the source for SysAllocStringByteLen:

Code Select


/******************************************************************************
 *             SysAllocStringByteLen     [OLEAUT32.150]
 *
 * Create a BSTR from an OLESTR of a given byte length.
 *
 * PARAMS
 *  str [I] Source to create BSTR from
 *  len [I] Length of oleStr in bytes
 *
 * RETURNS
 *  Success: A newly allocated BSTR
 *  Failure: NULL, if len is >= 0x80000000, or memory allocation fails.
 *
 * NOTES
 *  -If len is 0 or oleStr is NULL the resulting string is empty ("").
 *  -This function always NUL terminates the resulting BSTR.
 *  -oleStr may be either an LPCSTR or LPCOLESTR, since it is copied
 *  without checking for a terminating NUL.
 *  See BSTR.
 */
BSTR WINAPI SysAllocStringByteLen(LPCSTR str, UINT len)
{
    bstr_t *bstr;

    /* Detect integer overflow. */
    if (len >= (UINT_MAX-sizeof(WCHAR)-sizeof(DWORD)))
    return NULL;

    bstr = alloc_bstr(len);
    if(!bstr)
        return NULL;

    if(str) {
        memcpy(bstr->u.ptr, str, len);
        bstr->u.ptr[len] = 0;
    }else {
        memset(bstr->u.ptr, 0, len+1);
    }
    bstr->u.str[(len+sizeof(WCHAR)-1)/sizeof(WCHAR)] = 0;

    return bstr->u.str;
}

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:51:44 AM

This is the source of alloc_bstr, called by some functions:

Code Select


static bstr_t *alloc_bstr(size_t size)
{
    bstr_cache_entry_t *cache_entry = get_cache_entry(size);
    bstr_t *ret;

    if(cache_entry) {
        EnterCriticalSection(&cs_bstr_cache);

        if(!cache_entry->cnt) {
            cache_entry = get_cache_entry(size+BUCKET_SIZE);
            if(cache_entry && !cache_entry->cnt)
                cache_entry = NULL;
        }

        if(cache_entry) {
            ret = cache_entry->buf[cache_entry->head++];
            cache_entry->head %= BUCKET_BUFFER_SIZE;
            cache_entry->cnt--;
        }

        LeaveCriticalSection(&cs_bstr_cache);

        if(cache_entry) {
            if(WARN_ON(heap)) {
                size_t fill_size = (FIELD_OFFSET(bstr_t, u.ptr[size])+2*sizeof(WCHAR)-1) & ~(sizeof(WCHAR)-1);
                memset(ret, ARENA_INUSE_FILLER, fill_size);
                memset((char *)ret+fill_size, ARENA_TAIL_FILLER, bstr_alloc_size(size)-fill_size);
            }
            ret->size = size;
            return ret;
        }
    }

    ret = CoTaskMemAlloc(bstr_alloc_size(size));
    if(ret)
        ret->size = size;
    return ret;
}

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:53:26 AM

The code for bstr_from_str, called by some functions, is:

Code Select


static inline bstr_t *bstr_from_str(BSTR str)
{
    return CONTAINING_RECORD(str, bstr_t, u.str);
}

But I don't find the code for CONTAINING_RECORD, that apparently is a macro.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 02:34:59 AM

Because there is some misleading information in the web about BSTRs, this is the description from Microsoft:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms221069(v=vs.85).aspx

A BSTR (Basic string or binary string) is a string data type that is used by COM, Automation, and Interop functions. Use the BSTR data type in all interfaces that will be accessed from script.

C++

Code Select


typedef WCHAR OLECHAR;
typedef OLECHAR* BSTR;
typedef BSTR* LPBSTR;

Remarks

A BSTR is a composite data type that consists of a length prefix, a data string, and a terminator. The following table describes these components.

Code Select


Item	Description
Length prefix	A four-byte integer that contains the number of bytes in the following data string.
                It appears immediately before the first character of the data string.
                This value does not include the terminating null character.
Data string	A string of Unicode characters. May contain multiple embedded null characters.
Terminator	Two null characters.

A BSTR is a pointer. The pointer points to the first character of the data string, not to the length prefix.

BSTRs are allocated using COM memory allocation functions, so they can be returned from methods without concern for memory allocation.

The following code is incorrect:

Code Select


BSTR MyBstr = L"I am a happy BSTR";

This code builds (compiles and links) correctly, but it will not function properly because the string does not have a length prefix. If you use a debugger to examine the memory location of this variable, you will not see a four-byte length prefix preceding the data string.

Instead, use the following code:

Code Select


BSTR MyBstr = SysAllocString(L"I am a happy BSTR");

A debugger that examines the memory location of this variable will now reveal a length prefix containing the value 34. This is the expected value for a 17-byte single-character string that is converted to a wide-character string through the inclusion of the "L" string modifier. The debugger will also show a two-byte terminating null character (0x0000) that appears after the data string.

If you pass a simple Unicode string as an argument to a COM function that is expecting a BSTR, the COM function will fail.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 03:18:23 AM

The problem are multiple concatenations. Therefore, the way to improve the speed is, obviously, to reduce or eliminate them. We need to allocate a buffer big enough and replace contents instead of concatenate strings. If the buffer is bigger than the final content, a fast way to reduce it is to call SysReallocStringLen, that, because does not need to allocate new memory in this case, will just change the prefix length of the BSTR. You did mention an string builder class...

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 08, 2016, 09:20:48 AM

Quote from: Jose Roca on July 08, 2016, 03:18:23 AM
The problem are multiple concatenations. Therefore, the way to improve the speed is, obviously, to reduce or eliminate them. We need to allocate a buffer big enough and replace contents instead of concatenate strings. If the buffer is bigger than the final content, a fast way to reduce it is to call SysReallocStringLen, that, because does not need to allocate new memory in this case, will just change the prefix length of the BSTR. You did mention an string builder class...

I am working on the string builder class right now.

I do like the idea of allocating a larger buffer for CBSTR strings rather than setting them to exact lengths when they are created. FB strings have a built in buffer (I'd have to look at the FBC code to see exactly how big) and they perform extremely well. The small amount of extra memory allocated to these buffers would be immaterial in the overall grand scheme of things.

Title: Re: AfxStr - Unicode String Functions
Post by: James Fuller on July 08, 2016, 09:52:21 AM

Is the focus here to use BSTR's for all unicode needs in the form of a CBStr wrapper?

I did try Marc's uStringW and I would like to see it included in any bench-marks along with native Fb Strings for a speed comparison.

James
AfxClipLeft -> uswClipLeft

Code Select


#define unicode
#include Once "windows.bi"
#define __VERBOSE_MODE__
#Include Once "Dyn_Wstring.bi"
Function uswClipLeft(uswMain As uStringW,Byval nCount As Long) As uStringW
    Dim As uStringW uswOut = uswMain
    If nCount <= 0 Then
        Return uswOut
    EndIf
    Dim As Long nLen = Len(uswMain)
    nCount = IIF(nLen < nCount,nLen,nCount)
    uswOut = Mid(uswMain,nCount + 1)
    Return uswOut
End Function
Dim As uStringW uswOne = uswClipLeft("abcdefghijk",4)
? uswOne

sleep

Title: Re: AfxStr - Unicode String Functions
Post by: Marc Pons on July 08, 2016, 11:22:14 AM

hi james

to verify speed , you sould use it without __VERBOSE_MODE__
because print will slow drasticaly the action , that verbose define is more to debugg and see behind the curtain...

you even dont need UNICODE define , nor windows.bi

Code Select


'#define unicode
'#include Once "windows.bi"
'#define __VERBOSE_MODE__
#Include Once "Dyn_Wstring.bi"
Function uswClipLeft(uswMain As uStringW,Byval nCount As Long) As uStringW
    Dim As uStringW uswOut = uswMain
    If nCount <= 0 Then
        Return uswOut
    EndIf
    Dim As Long nLen = Len(uswMain)
    nCount = IIF(nLen < nCount,nLen,nCount)
    uswOut = Mid(uswMain,nCount + 1)
    Return uswOut
End Function
Dim As uStringW uswOne = uswClipLeft("abcdefghijk",4)
? uswOne

sleep

and you could directly use the u_mid function

you are only testing ascii codes , it will be more fun with real unicode values
it is why, i use in my tests euro symbol to be more representative... \u20AC

last remark , if speed is very important you also can comment __U_CLEAN_MEM__ option in the Dyn_Wstring.bi

Code Select

#ifndef __U_CLEAN_MEM__                       ' to "free" the remaining allocated memory when program ends
      '#define __U_CLEAN_MEM__                    ' if no used you can reduce around 2048 bytes on your executable
   #endif                                        ' to not compile, simply comment the define

the pseudo linked list will not be active , so better speed...
the type destructor is normally suffisant for automatic free ( if some remain, no problem , they will be cleaned when progr exits)

I have let that option for "cleaner/nicer" coding only, and because i was not completely sure at beginning how to do

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 11:29:27 AM

> Is the focus here to use BSTR's for all unicode needs in the form of a CBStr wrapper?

The class is lightweight and without complexities, and thanks to the use of temporary types, we can use it almost as if it was a native type.

It is not the class what causes speed problems, but string concatenations. And this problem happens no matter if you use a class or a native type. This is why Bob implemented an string builder class in PB.

In my first versions of the TypeLib Browser, I used the easy way of multiple string concatenations and it was slow when parsing big type libraries such Excel. I wrote a procedure that used a global string of 1 MB (as it was to be only used by this application there was not need for further complications) and suddenly became the fatest COM browser available.

In all languages, the MID statement or its equivalent is fast because it just replaces the contents, without memory allocations/reallocations (only using assembler or pointers to avoid the overhead of calling a function can improve it), whereas the LEFT, MID and RIGHT functions are slow because they create new temporary strings. So if you use MID(s, 5, 4) = "Paul", it is fast, but s = LEFT(s, 4) & "Paul" & MID(s, 9) is slow becase it creates temporary strings and also has to allocate a new one to store the concatenation of them and deallocate the old one. If you do it repeteadly or the strings are very big it can become painfully slow.

Title: Re: AfxStr - Unicode String Functions
Post by: Marc Pons on July 08, 2016, 12:39:16 PM

Jose, Paul ...

excuse me in advance, if my remarks are not relevant, or if i'm interfering too much

as James is asking , but in more brutal form : is it needed to go via the BSTR story for unicode ?
( for me BSTR seem not so fast/easy way)

all the points you said about concatenation ... is true, it is time consuming but on practice, the normal variable fb string type is already doing the job quite well , just by avoiding to allocate/ reallocate for each byte , it will do it by steps (32 as i remember)
i've already digged a lot on the string manipulation functions to make my own lib for replace/ split ... and compare at that time with PowerBasic ( comparable results for fb when using pointers),
and it is the same model i am experiencing (by 16 in the post) in my uStringW , it could be changed to 32 or 48 or more ( just to balance speed vs size), the real size allocation and the real len are included in the type definition to ease that purpose

And last point, if the deal is to work with real unicode char (not ansi with 2 bytes) , the complexity is not as that level , it is at the surrogate pair state, that's why i added the information on how many surrogate are in the uStringW to bypass when not needed that botleneck
waiting your answer

note : I sure also BSTR are also needed for COM, i'am just not sure it is relevant for unicode variable type

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 01:44:05 PM

My main purpose is to use it with COM, so I need to work with real BSTRs. There is not problem to pass unicode content to APIs that simply expect an unicode string, but COM needs BSTRs.

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 08, 2016, 05:31:49 PM

Quote from: TechSupport on July 08, 2016, 09:20:48 AM

I am working on the string builder class right now.

I didn't get as much time today to work on the code as I had hoped but I have just been able to finish the "add" portion of the string builder class. It uses overload functions to add CBSTR, Ansi/STRING, or WSTRING strings. It returns a CBSTR.

A simple test of 40,000 string concatenations of "Paul Squires" shows this dramatic difference:

CBSTR: 24.8125 seconds
FB Strings: 0 seconds
StringBuilder Class: 0.0117 seconds

Not bad so far.

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 08, 2016, 05:42:19 PM

...and this is very preliminary code for the stringbuilder class. Most are just stubs/placeholders at this point. The Add is working.

Jose, take a look at the very last function. Did you say in another post that you wouldn't need to use <type> if assigning directly to a CBSTR as the return value of a function?

Edit: Final code can now be found in this thread: http://www.planetsquires.com/protect/forum/index.php?topic=3892.0

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 06:46:11 PM

> Did you say in another post that you wouldn't need to use <type> if assigning directly to a CBSTR as the return value of a function?

Yes, I did. As the returning type isn't a plain structure, but a class, both the constructor of the class and the LET operator (if the class has an overloaded LET constructor) are called, so the constructor creates an empty BSTR and the LET operator deletes it and creates a new BSTR with the contents pointed by the returned handle.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 06:55:00 PM

I didn't kinow that returning a TYPE will cause a call to the constructor and LET operator of that class. Now that I know it, seems logical, but I was not used to it, because returning a TYPE (structure) in PB just returns an array of bytes. Maybe if we were using the word CLASS instead of TYPE I would have figured it sooner.

I wonder why some of the most useful features of this compiler aren't well explained or not explained at all.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 08, 2016, 07:02:51 PM

> A simple test of 40,000 string concatenations of "Paul Squires" shows this dramatic difference:

The bottlenecks are always the string concatenations.

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 10, 2016, 12:06:16 AM

Jose, do these functions (AfxStr.inc) need to be modified again now that you have modified the CBSTR class?

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 10, 2016, 12:13:56 AM

Oh, yes. It was mainly a test. The ones that are concatenation intensive will benefit from using the new string builder class.

Many of the code that I post is intended to exchange ideas and, very often, after I post it other ideas come to my mind. The "official" code is the one that I upload in the CWindow package.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 10, 2016, 12:17:29 AM

When we will finish all the work, maybe all these posts should be archived, to no confuse the users.

Title: Re: AfxStr - Unicode String Functions
Post by: Paul Squires on July 10, 2016, 12:38:29 AM

Quote from: Jose Roca on July 10, 2016, 12:17:29 AM
When we will finish all the work, maybe all these posts should be archived, to no confuse the users.

Yes, that's a very good idea.

Title: Re: AfxStr - Unicode String Functions
Post by: José Roca on July 12, 2016, 08:01:38 AM

Paul's string functions adapted to the new CWSTR data type.

PlanetSquires Forums

Support Forums => General Board => Topic started by: José Roca on July 07, 2016, 01:35:30 AM