CBSTR StringBuilder Class

Started by Paul Squires, July 09, 2016, 11:45:45 PM

Previous topic - Next topic

Marc Pons

josé

i think you forgot to put the len operator, without it , len gives the sizeof(type), not the len in wchars

here my proposal

' ========================================================================================
' The number of characters currently stored in the class is returned as a Long value.
' ========================================================================================

FUNCTION CWstr.Len () AS LONG
   FUNCTION = m_BufferLen \ 2     ' buffer is wide characters (2 bytes each)
END FUNCTION

OPERATOR Len(BYREF cws AS CWSTR) AS LONG
   OPERATOR = cws.len()
END OPERATOR
' ========================================================================================

Paul Squires

Holy crap! I missed almost this entire conversation today! :-)  Been busy at work and am only now checking the forums. It is awesome that my little string builder class has taken on a life of it's own and that Jose was able to transform it into its own data type. I will download the new code and try it. IIRC, there is a large default capacity buffer (16K) so you might want to make the default somewhat smaller. I know that it can be overloaded but most times we won't bother doing that and 16K per string seems a bit over done.
Paul Squires
PlanetSquires Software

José Roca

#32
I had the feeling that it was too good to be true. Apparently returning a CBSTR works because, by default, Windows caches BSTRs, so the BSTR used in the class is still accessible after it has been freed with SysFreeString and can be copied.

But this stuff of temporary types isn't working as we thought. We (or I) have misunderstood it.

The problem is that when we do FUNCTION = <our type> it first calls the destructor of our type and then calls the constructor of the temporary copy to be returned. Therefore the memory of the type to be returned has already been released and can't be copied to the temporary type returned. This is why it GPFs if we return a CWSTR using FUNCTION = CWStr, and not if we use FUNCTION = CWStr.Str, that creates a CBSTR.

To work, it should call the constructor of the temporary type to be returned before calling the destructor of the type that we intend to return, that is what I thought it was doing.

The documentation says that "The Constructor for the type, if there is one, will be called when the temporary copy is created. And the Destructor for the type, if there is one, will be called immediately after its use.", but what I'm seing when using FUNCTION  = CWStr is that the destructor for CWStr is being called before the constructor for the temporary string to be returned.

This is a show stopper. It is not safe to build a framework based in the Windows cache for BSTRs, because it will fail if it is disabled.

It doesn't make sense to me to call the destructor of the type to be returned before calling the constructor of the target type. It should be the opposite. Otherwise, we haven't the opportunity to copy the data to the target type.

Paul Squires

Quote from: Jose Roca on July 11, 2016, 10:10:56 PM
It doesn't make sense to me to call the destructor of the type to be returned before calling the constructor of the target type. It should be the opposite. Otherwise, we haven't the opportunity to copy the data to the target type.

That doesn't make sense at all. Maybe try putting in some debug print statements to verify 100% of the order of construction/destruction?
Paul Squires
PlanetSquires Software

José Roca

This is what I have done, and the destructor of the type to be copied is called before the constructor of the target type is being called.

A simple test:


FUNCTION Foo () AS CWSTR
   DIM wszText AS CWSTR = "Test string"
   FUNCTION = wszText
END FUNCTION


being called as


DIM cws AS CWSTR = Foo


This is the sequence:

When I do DIM wszText AS CWSTR = "Test string"

This constructor is called:


CONSTRUCTOR CWstr (BYREF ansiStr AS STRING = "", BYVAL nCodePage AS LONG = 0)


That calls


PRIVATE FUNCTION CWstr.ResizeBuffer (BYVAL nValue AS LONG) AS LONG
FUNCTION CWstr.Add (BYREF ansiStr AS STRING, BYVAL nCodePage AS LONG = 0) AS LONG
PRIVATE FUNCTION CWstr.AppendBuffer (BYVAL addrMemory AS ANY PTR, BYVAL nNumBytes AS LONG) AS LONG


But when I do


FUNCTION = wszText


The sequence is:


CWSTR Destructor
CONSTRUCTOR CWstr (BYREF cws AS CWSTR)
PRIVATE FUNCTION CWstr.ResizeBuffer (BYVAL nValue AS LONG) AS LONG
FUNCTION CWstr.Add (BYREF cws AS CWSTR) AS LONG
OPERATOR CWstr.CAST () AS ANY PTR
PRIVATE FUNCTION CWstr.AppendBuffer (BYVAL addrMemory AS ANY PTR, BYVAL nNumBytes AS LONG)


Notice that the first thing that it does is to call the CWSTR destructor.

How I'm going to copy its contents if the type has been destroyed?

The CBSTR works because the BSTR has been cached by Windows. So even if the type has been destroyed and the BSTR freed, it can still access it. This made me to think that it was working.

But if I call the Foo function with DIM cws AS CWSTR = Foo, it GPFs. This is what has made me to think that something was not working as it should.

With this behavior, it is not possible to return types from a function, unless they are simple types containing scalar values. In this case, FB does a direct copy.

The Foo function works if I change the return type to AS STRING.


FUNCTION Foo () AS STRING
   DIM wszText AS CWSTR = "Test string"
   FUNCTION = wszText
END FUNCTION


being called as


DIM cws AS CWSTR = Foo


Because it copies the CWSTR buffer to the string BEFORE destroying CWSTR.

But the problem is that can't be used with unicode because it converts it to ansi automatically.

I think that this behavior is wrong and should be changed; otherwise returning types is useless.

José Roca

But wait, I have found the solution! At least that is what I hope.

If I use RETURN instead of FUNCTION =, it works.


FUNCTION Foo () AS CWSTR
   DIM wszText AS CWSTR = "Test string"
   RETURN wszText
END FUNCTION


José Roca

#36
Apparently, when using FUNCTION =, the assignment is done after the type has gone out of scope.

But when using RETURN, the assignment is done BEFORE the type goes out of scope.

So the sequence becomes:


CONSTRUCTOR CWstr (BYREF cws AS CWSTR)
PRIVATE FUNCTION CWstr.ResizeBuffer (BYVAL nValue AS LONG) AS LONG
FUNCTION CWstr.Add (BYREF cws AS CWSTR) AS LONG
OPERATOR CWstr.CAST () AS ANY PTR
PRIVATE FUNCTION CWstr.AppendBuffer (BYVAL addrMemory AS ANY PTR, BYVAL nNumBytes AS LONG)
CWSTR Destructor


That is what FUNCTION = should do.

So the solution is to use RETURN instead of FUNCTION.

José Roca


Paul Squires

Wow. I would never have thought there would be such a subtle difference! Awesome that you found a working solution!
:)
Paul Squires
PlanetSquires Software

José Roca

If this not a bug, I don't know what it is.

José Roca

#40
Quote from: TechSupport on July 12, 2016, 12:16:45 AM
Wow. I would never have thought there would be such a subtle difference! Awesome that you found a working solution!
:)


And nobody else. The documentation says that using RETURN is like calling FUNCTION = value : EXIT FUNCTION.

But even if I use


FUNCTION Foo3 () AS CWSTR
   DIM wszText AS CWSTR = "Test string"
   FUNCTION = wszText
   EXIT FUNCTION
END FUNCTION


The destructor of wszText is being callef BEFORE the constructor of the temporary type.

Using RETURN, it works as it should.

Guess that these C programmers always use RETURN and nobody has tested FUNCTION with types like the ones that we are using.

Well, I have now to search for all FUNCTION = and use RETURN.

For a moment, I thought that I had to throw away all the work.

José Roca

Quote from: Marc Pons on July 11, 2016, 03:48:42 PM
josé

i think you forgot to put the len operator, without it , len gives the sizeof(type), not the len in wchars

here my proposal

' ========================================================================================
' The number of characters currently stored in the class is returned as a Long value.
' ========================================================================================

FUNCTION CWstr.Len () AS LONG
   FUNCTION = m_BufferLen \ 2     ' buffer is wide characters (2 bytes each)
END FUNCTION

OPERATOR Len(BYREF cws AS CWSTR) AS LONG
   OPERATOR = cws.len()
END OPERATOR
' ========================================================================================


Hi Marc,

Yes, I will do. I modified Paul's code, where it was implemented as tLeft, and didn't remember to change it for an operator. Thanks very much for noticing it.

José Roca

#42
I have changed the "FUNCTION =" with "RETURN", removed the Left function, added this operator


OPERATOR LEN (BYREF cws AS CWSTR) AS LONG
   CBSTR_DP("CWSTR OPERATOR LEN")
   OPERATOR = .LEN(**cws)
END OPERATOR


and modified


' ========================================================================================
FUNCTION CWstr.Add (BYREF cws AS CWSTR) AS LONG
   CBSTR_DP("***** CWSTR Add 2 - LEN = " & WSTR(LEN(*cast(WSTRING PTR, @cws))))
   ' Incoming string is already in wide format, simply copy it to the buffer.
   DIM AS LONG nLenString = LEN(*cast(WSTRING PTR, @cws))
   IF nLenString = 0 THEN RETURN 0
   ' Copy the string into the buffer and update the length
   this.AppendBuffer(cast(ANY PTR, cws), nLenString * 2)
   RETURN 0
END FUNCTION
' ========================================================================================


Because I modified Paul's code to mark the end of the string with a double null to be able to deference it with a pointer, since CWSTR is now a null terminated data type instead of an helper string builder class.

This is the modified example of Paul:


'#define _CBSTR_DEBUG_ 1
#include once "CBStr.inc"
using Afx.CBStrClass

dim sb as CWSTR
dim cbs as CBSTR = "Paul"

for i as long = 1 to 100
   sb.Add cbs
next

print sb.Str

Print
print "String Length: "; LEN(sb)
Print "Capacity: "; sb.Capacity

Print
Print "First 5 character values before..."
For i As Long = 1 To 5
   Print sb.Char( i );
Next
Print

Print
Print "Change the first 5 character values..."
sb.Char(1) = 74
sb.Char(2) = 111
sb.Char(3) = 115
sb.Char(4) = 101
sb.Char(5) = 33

Print
Print "First 5 character values after..."
For i As Long = 1 To 5
   Print sb.Char( i );
Next
Print
Print
print sb.Str
print

Print
Print "Now delete the First 5 characters in the buffer..."
Print
print "String Length before: "; LEN(sb)
sb.DelChars( 1, 5 )
print "String Length after: "; LEN(sb)
print
print sb.Str; "************"
print

Print
Print "Clear the buffer and add some new text before doing an insert"
sb.Clear
sb.Add "12345678901234567890123456789012345678901234567890"
Print "Now insert 'PlanetSquires' (Len=13) starting at position 5..."
Print
print "String Length before: "; LEN(sb)
print sb.Str
cbs = "PlanetSquires"
sb.Insert( cbs, 5 )
print
print sb.Str
print "String Length after: "; LEN (sb)
print

print "Press any key..."
sleep


New file attached. Hope it will work fine and this will be the end of nasty suprises. All is well if it ends well.

José Roca

Quote
IIRC, there is a large default capacity buffer (16K) so you might want to make the default somewhat smaller. I know that it can be overloaded but most times we won't bother doing that and 16K per string seems a bit over done.

I forgot your remark. What size do you suggest?

You know that when calling the API functions most of the times we have to specify the size. The advantage of using this type (CWSTR) is that the length can be specified dynamically, not at compile time like WSTRINGs. As you know, one of the nastier problems with null terminated strings is that when we don't know the size in advance we have to allocate a buffer with, e.g. Allocate, CAllocate, etc., instead of using a WSTRING. With CWSTR we can even use a variable to especify the size.


José Roca

We can use CBSTR or CWSTR in the same way.


' ========================================================================================
PRIVATE FUNCTION AfxGetWindowText (BYVAL hwnd AS HWND) AS CBSTR
   DIM nLen AS LONG = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   DIM wszText AS CBSTR = SPACE(nLen + 1)
   SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, *wszText))
   RETURN wszText
END FUNCTION
' ========================================================================================

' ========================================================================================
PRIVATE FUNCTION AfxGetWindowText (BYVAL hwnd AS HWND) AS CWSTR
   DIM nLen AS LONG = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   DIM wszText AS CWSTR = SPACE(nLen + 1)
   SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, *wszText))
   RETURN wszText
END FUNCTION
' ========================================================================================


For functions like this one, that don't perform concatenations, I would use CBSTR, and reserve the use of CWSTR for operations that use many concatenations.