CWindow RC 23

Started by José Roca, October 03, 2016, 03:04:12 PM

Previous topic - Next topic

Marc Pons

it is not complete answer:

why  operrator "+" works correctly in that case?
because that operator expect a string after it and the compiler use the implicit cast implemented by the cwstr class to make the job.

in reality the convertion task is done but not seen by the user.

José Roca

Quote
i think you can define a new overload operator "&" to use cwstr directly  without **cwsText
for string result


Operator & (ByRef ust1 as string , ByVal cwst2 as CWSTR) as string
      return ust1 & **cwst2  'using here the implicit cast to string to convert cwst2 content to string
'or      return ust1 & str(**cwst2) 'explicit conversion
end operator


easier way but more operations !
could also be duplicated to CWSTR  or WSTRING

No way. Converting it to ansi with STR will screw the unicode contents. Better to use +.

José Roca

Modified an small bug in one of the Add methods of CWSTR. Didn't noticed before because I was not using UTF-8.


PRIVATE SUB CWstr.Add (BYREF ansiStr AS STRING, BYVAL nCodePage AS UINT = 0)
   CWSTR_DP("CWSTR Add STRING Code page = " & WSTR (nCodePage))
'   DIM AS LONG nLenString = .LEN(ansiStr) * 2
'   IF nLenString = 0 THEN RETURN
   IF LEN(ansiStr) = 0 THEN RETURN
   ' Create the wide string from the incoming ansi string
   DIM pbstr AS BSTR
   IF nCodePage = 0 THEN nCodePage = m_CodePage
   IF nCodePage = CP_UTF8 THEN
      DIM dwLen AS DWORD = MultiByteToWideChar(CP_UTF8, 0, STRPTR(ansiStr), LEN(ansiStr), NULL, 0)
      IF dwLen THEN
         pbstr = SysAllocString(WSTR(SPACE(dwLen)))
         MultiByteToWideChar(CP_UTF8, 0, STRPTR(ansiStr), LEN(ansiStr), pbstr, dwLen * 2)
      END IF
   ELSE
      pbstr = SysAllocString(WSTR(ansiStr))
      MultiByteToWideChar(m_CodePage, MB_PRECOMPOSED, STRPTR(ansiStr), -1, pbstr, LEN(ansiStr) * 2)
   END IF
   IF pbstr THEN
      ' Copy the string into the buffer and update the length
'      this.AppendBuffer(pbstr, nLenString)
      this.AppendBuffer(pbstr, SysStringLen(pbstr) * 2)
      SysFreeString(pbstr)
   END IF
END SUB


When used with UTF-8, it was passing the length in UTF-8 instead of UTF-16.

José Roca

The problem with the intrinsic FB functions that don't work directly with CWSTR seems to be that they don't trigger the casting operators of the class.

AfxMsg LEFT(cws, 10) fails with an ambiguous call error, but AfxMsg LEFT("" & cws, 10) works. The concatenation triggers the cast operator.

Marc Pons

Quote from: Jose Roca on October 10, 2016, 02:13:02 PM



No way. Converting it to ansi with STR will screw the unicode contents. Better to use +.
its true but as i also said can be extended to wstring and cwstr
hope it work like that: (not tested!)

Operator & (ByRef wst1 as wstring , ByVal cwst2 as CWSTR) as wstring
      return wst1 & **cwst2
end operator
Operator & (ByRef cwst1 as CWSTR , ByVal cwst2 as CWSTR) as CWSTR
      return cwst1 & **cwst2
end operator


I know + is working but it is not coherent with the freebasic convention
+ normaly to concatenate string or wstrings only
& helper to convert to "string" and concatenate

Marc Pons

QuoteAfxMsg LEFT(cws, 10) fails with an ambiguous call error, but AfxMsg LEFT("" & cws, 10) works. The concatenation triggers the cast operator.
it is normal behaviour, your cast in cwstr class are the following :
' ========================================================================================
' Returns a pointer to the CWSTR buffer.
' ========================================================================================
PRIVATE OPERATOR CWstr.CAST () BYREF AS WSTRING
   CWSTR_DP("CWSTR CAST BYREF AS WSTRING - buffer: " & WSTR(m_pBuffer))
   OPERATOR = *cast(WSTRING PTR, m_pBuffer)
END OPERATOR
' ========================================================================================
' ========================================================================================
PRIVATE OPERATOR CWstr.CAST () AS ANY PTR
   CWSTR_DP("CWSTR CAST ANY PTR - buffer: " & WSTR(m_pBuffer))
   OPERATOR = cast(ANY PTR, m_pBuffer)
END OPERATOR
' ========================================================================================


probably the cast for any ptr is the trick here

José Roca

#36
Implementing & as an overloded operator for CWSTR works, of course (I have tried it before)


PRIVATE Operator & (ByRef cwst1 as CWSTR , ByRef cwst2 as CWSTR) as CWSTR
   CWSTR_DP("CWSTR Operator &")
   RETURN **cwst1 & **cwst2
END OPERATOR


but it is pretty innefficient, and my goal is to be as fast as possible.

A line like this


cws = "Line " & WSTR(1) & ", Column " &  WSTR(2) & ": " & **cwsText


generates this trace code


CWSTR OPERATOR * buffer: 6632152
CWSTR LET WSTRING PTR
CWSTR Clear
CWSTR ResizeBuffer - Value = 27
CWSTR ResizeBuffer - pNewBuffer = 6631952 - old buffer = 6632688
CWSTR Add WSTRING
CWSTR AppendBuffer 0 54
CWSTR ResizeBuffer - Value = 108
CWSTR ResizeBuffer - pNewBuffer = 6632688 - old buffer = 6631952
--END - CWSTR AppendBuffer 54


Using the operator +


cws = "Line " & WSTR(1) + ", Column " &  WSTR(2) & ": " + cwsText


generates this trace code


CWSTR CAST BYREF AS WSTRING - buffer: 2634456
CWSTR LET WSTRING PTR
CWSTR Clear
CWSTR ResizeBuffer - Value = 27
CWSTR ResizeBuffer - pNewBuffer = 2634288 - old buffer = 2634992
CWSTR Add WSTRING
CWSTR AppendBuffer 0 54
CWSTR ResizeBuffer - Value = 108
CWSTR ResizeBuffer - pNewBuffer = 2634992 - old buffer = 2634288
--END - CWSTR AppendBuffer 54


Changing it to


cws = "Line " & WSTR(1) & ", Column " &  WSTR(2) & ": " & cwsText


with the overloaded & operator implemented, generates this trace code


+++BEGIN- CWSTR CONSTRUCTOR WSTRING - 4255888
CWSTR ResizeBuffer - Value = 520
CWSTR ResizeBuffer - pNewBuffer = 8533768 - old buffer = 0
CWSTR Add WSTRING
CWSTR AppendBuffer 0 36
--END - CWSTR AppendBuffer 36
-END- CWSTR CONSTRUCTOR WSTRING - 8533768
CWSTR Operator &
CWSTR OPERATOR * buffer: 8532696
CWSTR OPERATOR * buffer: 8533768
+++BEGIN- CWSTR CONSTRUCTOR WSTRING - 8532432
CWSTR ResizeBuffer - Value = 520
CWSTR ResizeBuffer - pNewBuffer = 8534304 - old buffer = 0
CWSTR Add WSTRING
CWSTR AppendBuffer 0 54
--END - CWSTR AppendBuffer 54
-END- CWSTR CONSTRUCTOR WSTRING - 8534304
CWSTR LET CWSTR
CWSTR Clear
CWSTR OPERATOR * buffer: 8534304
CWSTR OPERATOR LEN - len: 27
CWSTR OPERATOR * buffer: 8534304
CWSTR ResizeBuffer - Value = 27
CWSTR ResizeBuffer - pNewBuffer = 8532432 - old buffer = 8533232
CWSTR OPERATOR @ - buffer: 8534304
CWSTR Add CWSTR - LEN = 27
CWSTR OPERATOR @ - buffer: 8534304
CWSTR CAST ANY PTR - buffer: 8534304
CWSTR AppendBuffer 0 54
CWSTR ResizeBuffer - Value = 108
CWSTR ResizeBuffer - pNewBuffer = 8532472 - old buffer = 8532432
--END - CWSTR AppendBuffer 54
***CWSTR DESTRUCTOR - buffer: 8533768
***CWSTR DESTRUCTOR - buffer: 8534304


Both ** and + generate identical code, except the first line: ** calls the operator * ( CWSTR OPERATOR * buffer: 6632152 ) and + calls the operator CAST ( CWSTR CAST BYREF AS WSTRING - buffer: 2634456 ).

The overloaded operator & has to create three instances of the CWSTR class, two to concatenate and another one to return the result.

We have not worked so hard to get a superfast unicode dynamic string to spoil it using innefficient techniques.

Marc Pons

Jose

sometime i've problems to understand or explain, probably because English is not my native language :'(

let me sumerize my purpose :

     I'm not saying "+" or "**" is wrong , they work perfectly to make the job , so it's a perfect true solution
     I'm just saying "&" could work also and for me the more important word is : also
     I think it is better to have that solution too than to have compilation problems
     I've said in my previous posts :  if  using "&"  it will be :
Quoteeasier way but more operations !
I've said also
QuoteI know + is working but it is not coherent with the freebasic convention
+ normaly to concatenate string or wstrings only
& helper to convert to "string" and concatenate

myself, i normally use without thinking more : "+" for "add" operation  and "&" for concatenation... probably coming from Fortran
i'm sure i'm not the only one doing that !

Last point, you are showing the extra operations if using overloaded "&" ,
but you know the way you have done the overload "&" to test it, could probably be optimized
as we are "derriere le rideau" / "behind the curtain", at level of operator we could directly play with the class functions...

At the end, it is not a big problem , these kind of overload operator does not need to be part of the class
it can be done everywhere in the code as you want it.

Probably you understood also, even i'm fare away of your knowledge, i like to go to the details :
the speed of concatenation was one of the aspects very important for me, if your remember our previous contacts.

I sincerely hope, you will not take that present post as a critic in any sort,
i just try to participate, showing at least my interest,
and at the end supporting as much as i can the very important job you are doing.

And, for all, please excuse my wording, if the syntax, grammar or tence are not perfect...

Marc



José Roca

I appreciate any comments and suggestions, but while trading some speed for ease of use is acceptable or even convenient in many cases, in others not.

For example, in a procedure that must return a result, it is usually faster to pass a variable by reference to get the result that returning it as the result of a function. If we have to return an small string, then it doesn't matter, but if we are using big strings, it does. There was an initial version of the class in which I overloaded the & operator and it slowed string concatenations considerably. This is why I used Paul's string builder code, added several changes like marking the end of the string with a double null to make it compatible with the FB intrinsic string functions and removed the operators except += and &=. Overloading the & operator is an speed killer. Also, as you have pointed, it is not mandatory to add it to the class. You can add it anywere in your code.

VB6 was a beginner's tool in which ease of use took precedence over efficiency. This is why they got the slower compiler ever made. It has spoiled generations of programmers.


José Roca

BTW I have implemented a new class, CWstrArray, to work with arrays of CWSTRs. Internally, it uses BSTRs because the safe array APIs work with this type of string, but the in and out parameters are CWSTRs. I have used low-level techniques to avoid copying data as much as possible: for example, when inserting or removing array elements, I don't copy the string data to expand or shrink the array, but I only move the BSTRs pointers.


' ========================================================================================
' * Deletes the specified element of the array.
' - nPos = Index of the array element to be removed.
' Return value: TRUE or FALSE.
' ========================================================================================
PRIVATE FUNCTION CWstrArray.DeleteItem (BYVAL nPos AS LONG) AS BOOLEAN
   CWSTRARRAY_DP("CWstrArray DeleteItem")
   DIM cElem AS LONG = nPos - this.LBound
   IF nPos < this.LBound OR nPos > this.UBound THEN RETURN FALSE
   DIM cElements AS LONG = this.UBound - this.LBound + 1
   DIM pvData AS AFX_BSTR PTR = this.AccessData
   IF pvData THEN
      ' // Save the element to be deleted
      DIM pTemp AS AFX_BSTR = pvData[cElem]
      ' // Move all the elements up
      FOR i AS LONG = cElem TO cElements - 1 STEP 1
         pvData[i] = pvData[i + 1]
      NEXT
      ' // Copy the element to be deleted to the end of the array
      pvData[cElements - 1] = pTemp
   END IF
   this.UnaccessData
   ' // Shrink the array by one element (will free the last element)
   IF this.Redim(cElements - 1) = S_OK THEN RETURN TRUE
END FUNCTION
' ========================================================================================


This makes a BIG difference regarding speed.

I also use direct access to get and set the string data instead of the slower SafeArrayGetElement / SafeArrayPutElement API functions.


' ========================================================================================
' * Gets an element of the array. If the function fails, it returns an empty string.
' - idx : The index of the array element.
' ========================================================================================
PRIVATE PROPERTY CWstrArray.Item (BYVAL idx AS LONG) AS CWSTR
   CWSTRARRAY_DP("PROPERTY ITEM [GET] - CWSTR")
   IF m_psa = NULL THEN EXIT PROPERTY
   SafeArrayLock(m_psa)
   DIM pvData AS AFX_BSTR PTR = this.PtrOfIndex(idx)
   IF pvData THEN PROPERTY = *pvData
   SafeArrayUnlock(m_psa)
END PROPERTY
' ========================================================================================

' ========================================================================================
' * Puts a string element at a given location in the array.
' - idx : The index of the array element.
' - cws : The string data to store.
' ========================================================================================
PRIVATE PROPERTY CWstrArray.Item (BYVAL idx AS LONG, BYREF cws AS CWSTR)
   CWSTRARRAY_DP("PROPERTY ITEM [PUT] - CWSTR")
   IF m_psa = NULL THEN EXIT PROPERTY
   SafeArrayLock(m_psa)
   DIM pvData AS AFX_BSTR PTR = this.PtrOfIndex(idx)
   IF pvData THEN *pvData = SysAllocString(**cws)
   SafeArrayUnlock(m_psa)
END PROPERTY
' ========================================================================================


This makes the internal code of many of my wrappers to look not very "BASIC", as James has pointed, but it is efficient.


José Roca

>  I'm just saying "&" could work also and for me the more important word is : also

I kown what you mean, but this is like putting a red button with a label saying "don't push me"... If it must not be pushed, why you put it?

If they use & and it compiles, they will always use & and then complain that it is slow. If it does not compile, they will learn that have to use + or **.

Marc Pons

I have done some modifications on the CWSTR.inc, i posted here because its some continuation of previous posts.

You can see it on the attached file with a test to verify and possibly compare with the existing one

the changes have affected different topics / behaviour: ( i've tried to mimic the string behaviour as much as possible)
the idea is if you know how to use string in specific situation it is the same with cwstr class!

@cwstr           : now gives the pointer of the cwstr var                      ( as cwstr ptr  )
varptr(cwstr)   : same as above                                             
strptr(cwstr)   : working and give the pointer to the internal buffer       ( as wstring ptr)
*cwstr         : dereference the internal pointer to wstring               ( byref as wstring)

new & operator to concatenate cwstr with wstring (both sides) or other cwstr  without **cwstr ( or any work-arround)

no more need to **cwstr ( or any work-arround) to use right and left functions they have been overloaded with new ones

and to finish : a bit faster at construction,  let operations and concatenation too

measured : 15 to 25 % faster, by optimization of operations (allocating only once when possible, avoiding intial alloc... )
and few tweeks more.

to test use the cwstr2.inc on the same folder as the test file and compile with console, you can trace the actions directly on the console.
remarks/comments appreciated as usual


Marc

José Roca

#42
I don't agree in removing the operator @ and changing the operator *. Besides breaking all my code and Paul's code, I think that you don't have understood why I have implemented it this way.

The purpose of the @ and * operators is to allow to pass a CWSTR to the Windows API functions that expect a WSTRING by reference. Whitout them, the compiler will pass a pointer to the CWSTR class instead of the underlying WSTRING and it will crash.

For example, in a function like this one


' ========================================================================================
' Gets the text of a window. This function can also be used to retrieve the text of buttons,
' and edit and static controls.
' Remarks: The function uses the WM_GETTEXT message because GetWindowText cannot retrieve
' the text of a window in another application.
' Example: DIM cws AS CWSTR = AfxGetWindowText(hwnd)
' ========================================================================================
PRIVATE FUNCTION AfxGetWindowText (BYVAL hwnd AS HWND) AS CWSTR
   DIM nLen AS LONG = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   DIM cwsText AS CWSTR = SPACE(nLen + 1)
   SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, *cwsText))
   RETURN cwsText
END FUNCTION
' ========================================================================================


I can use cast(LPARAM, *cwsText) or cast(LPARAM, @cwsText), but with your suggested changes, I will have to use cast(LPARAM, cwsText.m_pBuffer).

In an earlier version of CBSTR, I used an "Addr" method instead of "**", and the only user that gave his opinion was Paul, that liked more to use **cws than cws.Addr.

It is nice to use *cws instead of **cws, but it is not so nice to use cws.m_pBuffer (or cws.Addr) instead of @cws.

To work seamless with the intrinsic FB functions the only solution is to have a dynamic unicode data  type implemented in the compiler natively.

José Roca

#43
Regarding Right and Left, I didn't know that I could overload these intrinsic functions. I can add these functions to CWSTR:


PRIVATE FUNCTION RIGHT (BYREF cws AS CWSTR, BYREF n AS LONG) AS CWSTR
   RETURN RIGHT(**cws, n)
END FUNCTION

PRIVATE FUNCTION LEFT (BYREF cws AS CWSTR, BYREF n AS LONG) AS CWSTR
   RETURN LEFT(**cws, n)
END FUNCTION


I can also add:


PRIVATE FUNCTION VAL (BYREF cws AS CWSTR) AS DOUBLE
   RETURN VAL(**cws)
END FUNCTION


But, of course, using RIGHT(**cws) is faster than RIGHT(cws).

José Roca

I don't understand the purpose of this function.


' ========================================================================================
' Write the number of bytes from the specified memory address to the buffer.     **** new function
' ========================================================================================
PRIVATE SUB CWstr.WriteBuffer (BYVAL addrMemory AS ANY PTR, BYVAL nNumBytes AS LONG)
   if nNumBytes < m_GrowSize /2  THEN nNumBytes += m_GrowSize/2
CWSTR_DP("CWSTR WriteBuffer " & WSTR(nNumBytes))
this.ResizeBuffer( nNumBytes * 2)
   memcpy(m_pBuffer , addrMemory, nNumBytes)
   m_BufferLen = nNumBytes
   ' Mark the end of the string with a double null
   m_pBuffer[m_BufferLen] = 0
   m_pBuffer[m_BufferLen + 1] = 0
   CWSTR_DP("--END - CWSTR WriteBuffer " & WSTR(m_BufferLen))
END SUB
' ========================================================================================


If I use


DIM wsz AS STRING = "abc"
DIM cws AS CWSTR = wsz
PRINT LEN(cws)


It returns a length of 123 instead of 3.