CWindow RC 23

Started by José Roca, October 03, 2016, 03:04:12 PM

Previous topic - Next topic

Marc Pons

for the WriteBuffer function, my mistake the function should be: (thanks to point that mistake)
PRIVATE SUB CWstr.WriteBuffer (BYVAL addrMemory AS ANY PTR, BYVAL nNumBytes AS LONG)
   m_BufferLen = nNumBytes  '****  assign m_BufferLen before modifying nNumBytes
   if nNumBytes < m_GrowSize /2  THEN nNumBytes += m_GrowSize/2
   CWSTR_DP("CWSTR WriteBuffer " & WSTR(nNumBytes))
   this.ResizeBuffer( nNumBytes * 2)
   memcpy(m_pBuffer , addrMemory, nNumBytes)
   ' Mark the end of the string with a double null
   m_pBuffer[m_BufferLen] = 0
   m_pBuffer[m_BufferLen + 1] = 0
   CWSTR_DP("--END - CWSTR WriteBuffer " & WSTR(m_BufferLen))
END SUB


Overloaded functions right, left : of course you can overload them , they are already overloaded
not tested if your proposals are faster than mines but why not.

sure val function can also be overloaded.

the @cwstr   and *cwstr  your proposal  gives same result    a pointer to the buffer (AS WSTRING PTR)

QuoteI can use cast(LPARAM, *cwsText) or cast(LPARAM, @cwsText), but with your suggested changes, I will have to use cast(LPARAM, cwsText.m_pBuffer).
my proposal, you missed, it is : strptr(cwstr)    giving the  pointer to the buffer (AS WSTRING PTR)
similar way as used with normal string to get the pointer to data  ( as zstring ptr)
that's why i've undef the strptr macro and i've recreated them as functions

so the use for window api is like:
PRIVATE FUNCTION AfxGetWindowText (BYVAL hwnd AS HWND) AS CWSTR
   DIM nLen AS LONG = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   DIM cwsText AS CWSTR = SPACE(nLen + 1)
   SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, strptr(cwsText)))  ' **** notice the use of strptr as with string
   RETURN cwsText
END FUNCTION


i aggree strptr is longer than @ or * but it is more "conventionnal" to not mix the use of @ and *
normally : @ is the operator to get the pointer to the var and * is the operator to get the value pointed by the pointer ( or by extension as in string the value of data pointer)

I'm not saying it has to be like that, i'm just holding some time, and review the work done to verify if it is still coherent with the other existing behaviour.
I think, the exeptions (even for good reason) are the way for complexity and at the end for errors.
And taking the opportunity also, to review if some optimization could be found ( speed for me is crucial, i can trade in some extend with the  "simplicity" but not too much)

I understand completly the behaviours are different and can have impacts on the other part of your framework...

but as your framework is not totally freezed...

No comments for & and for speed ?

Marc Pons

for right noticed a mistake on the logic in my proposal
corrected here

PRIVATE FUNCTION RIGHT( BYREF cwstr AS CWSTR, BYREF n AS LONG )AS CWSTR
CWSTR_DP("CWSTR -RIGHT FUNCTION-")
FUNCTION = cast(wstring ptr, cwstr.m_pBuffer + cwstr.m_BufferLen - (n * 2))
END FUNCTION


tested your overload proposal right and left : mines (after that correction) are a bit faster

and confirm also with my modified class , the speed has been improved at least 10%  up to 25%

even with the & new operator

Marc Pons

#47
QuoteBut, of course, using RIGHT(**cws) is faster than RIGHT(cws).
its true with your cwstr.inc  but with cwstr2.inc and with my overload proposal for left/right not true

in fact almost 2 time faster with my solution ( because I do not allocate 2 times...)


stop for today, family lunch now  :)

José Roca

#48
> the @cwstr   and *cwstr  your proposal  gives same result    a pointer to the buffer (AS WSTRING PTR)

Of course, but it is more intuitive, at least for me, to use @ with out parameters and * with in parameters. VARPTR and STRPTR also return the same value when used with WSTRINGs because there is only one address to return. These data types don't have a descriptor.

> i aggree strptr is longer than @ or * but it is more "conventionnal" to not mix the use of @ and *
normally : @ is the operator to get the pointer to the var and * is the operator to get the value pointed by the pointer ( or by extension as in string the value of data pointer

Not with ZSTRING, WSTRING of fixed length strings. See above.
And CWSTR is also a null terminated string without a descriptor.

José Roca

> for the WriteBuffer function, my mistake the function should be: (thanks to point that mistake)

I still don't understand why do you do that. You pass numBytes to copy and you end copying 260 additional bytes of garbage?

José Roca

> tested your overload proposal right and left : mines (after that correction) are a bit faster

I still have to check it. It is unsafe code because there are not checks for bounds. What if I pass a negative value or a value bigger than the buffer?

Marc Pons

#51
good some new input !

if you feel happy mixing @ and * , no problem for me

but "official" freebasic definition
QuoteOperator @ (Address of) returns the memory address of its operand
Operator * (Value of) returns a reference to the value stored at an address, and is often called the dereference operator
you can choose to overload to what you want, but at the end , it is more complexity

QuoteVARPTR and STRPTR also return the same value when used with WSTRINGs
true but both are dedicated to return a pointer not a value as * is supposed to
and with my proposal varptr and strptr give their respective true pointer( one to cwstr ptr, and other to wstring ptr (casted)

QuoteAnd CWSTR is also a null terminated string without a descriptor.
:o, for me CWSTR class is not really different than dynamic String, lets compare the structure elements
data as zstring ptr  for string          :         m_pBuffer    AS UBYTE PTR      for cwstr
len   as long          for string           :        m_BufferLen AS LONG              for cwstr
size  as long          for string          :         m_Capacity  AS LONG              for cwstr

data in the string type is obviously a null terminated string, so i do not see structure difference

again thanks to point the second mistakes on CWstr.WriteBuffer, i've replied to fast
the reason of that function is only: to not have all the cases done on the CWstr.AppendBuffer (avoiding initial setting for nothing)
and if the length is small enougth to increase it a bit to not have to resize after but not fully needed

last proposal
PRIVATE SUB CWstr.WriteBuffer (BYVAL addrMemory AS ANY PTR, BYVAL nNumBytes AS LONG)
   m_BufferLen = nNumBytes  '****  assign m_BufferLen before modifying nNumBytes
   ' the idea here is to have at least some buffer reserve to not have always to resize if append after
   if nNumBytes < m_GrowSize /2  THEN nNumBytes += m_GrowSize/2  ' can be avoided probably
   CWSTR_DP("CWSTR WriteBuffer " & WSTR(nNumBytes))
   this.ResizeBuffer( nNumBytes * 2)
   memcpy(m_pBuffer , addrMemory, m_BufferLen) ' nNumBytes '**** sure not needed to copy garbage thanks Jose
   ' Mark the end of the string with a double null
   m_pBuffer[m_BufferLen] = 0
   m_pBuffer[m_BufferLen + 1] = 0
   CWSTR_DP("--END - CWSTR WriteBuffer " & WSTR(m_BufferLen))
END SUB


and obviouly the right and left have  been secured
'New overload function right for cwstr
PRIVATE FUNCTION RIGHT( BYREF cwstr AS CWSTR, BYREF n AS LONG )AS CWSTR
CWSTR_DP("CWSTR -RIGHT FUNCTION-")
if cwstr.m_BufferLen = 0 or n <= 0 THEN
'RETURN ""
FUNCTION = "" '**** probably faster with return
elseif n > cwstr.m_BufferLen THEN
'RETURN CAST(WSTRING PTR, cwstr.m_pBuffer)
FUNCTION = cast(wstring ptr, cwstr.m_pBuffer)'**** probably faster with return
        else
'RETURN CAST(WSTRING PTR, cwstr.m_pBuffer + cwstr.m_BufferLen - (n * 2))
FUNCTION = cast(wstring ptr, cwstr.m_pBuffer + cwstr.m_BufferLen - (n * 2))'**** probably faster with return
END IF
END FUNCTION

'New overload function left for cwstr
PRIVATE FUNCTION LEFT( BYREF cwstr AS CWSTR, BYREF n AS LONG )AS CWSTR
CWSTR_DP("CWSTR -LEFT FUNCTION-")
IF cwstr.m_BufferLen = 0 or n <= 0 THEN
FUNCTION = ""
EXIT FUNCTION
ELSEIF  n > cwstr.m_BufferLen THEN
FUNCTION = CAST(WSTRING PTR, cwstr.m_pBuffer)
EXIT FUNCTION
END IF
DIM pNewBuffer AS WSTRING PTR = cast(WSTRING PTR , cwstr.m_pBuffer)
dim as ubyte u1, u2
u1 = cwstr.m_pBuffer[(n * 2)]
u2 = cwstr.m_pBuffer[(n * 2) + 1]
pNewBuffer[n] = 0
FUNCTION = pNewBuffer
cwstr.m_pBuffer[(n * 2)] = u1
cwstr.m_pBuffer[(n * 2) + 1] = u2
END FUNCTION 


all in the attachment : CWSTR2.inc and test extended

About speed, do you agree, in fact it is the most important thing

José Roca

#52
> for me CWSTR class is not really different than dynamic String, lets compare the structure elements

The difference if that the intrinsic functions of the FB compiler can access the members of the FB string structure through the string descriptor. With m_pBuffer alone, you don't have access to the other variables of the class.

My use of @ and * is consistent with other data types such CBSTR and CSafeArray. These data types need an operator to return the address of the variable that holds the pointer and another operator to return the pointer. Using a system with CWSTR and another with CBSTR and CSafeAray is inconsistent. Both BSTR and SafeArray have descriptors.

> About speed, do you agree, in fact it is the most important thing

The faster way is to use + or **

Anyway, the speed in LEFT, RIGHT is not very important because they are used sparely and usually to return small strings.


Marc Pons

Jose ,

I'm speaking about speed of CWSTR class, globally , construction , let ...

I will send a meesage directly


I also noticed here as you before, event the CWSTR is a core element, because it gives the dynamic unicode string type,
only 2 people give it some attention!

:-\

José Roca

#54
Regarding the & operator, I have noticed the following

This works without having to use **cwsText


DIM cws AS CWSTR
DIM cwsText AS CWSTR = "test string"
cws = "Line " & 1 & ", Column " & 2 & ": " & cwsText
print cws


This also works


DIM cws AS CWSTR
DIM cwsText AS CWSTR = "test string"
cws = "Line " & STR(1) & ", Column " &  STR(2) & ": " & cwsText
print cws


This fails unless we use **cwsText


DIM cws AS CWSTR
DIM cwsText AS CWSTR = "test string"
cws = "Line " & WSTR(1) & ", Column " &  WSTR(2) & ": " & cwsText
print cws


The behavior of this operator is somewhat erratic. The + operator does not give problems.

Marc Pons

Jose

my last evolution / optimized /simplified , DWSTR class

Renamed to be able to use it with your CWSTR class on same code to compare speed
the reference CWSTR is the one you post on the RC 24 evolution

i put it to simplify on the attached file with my_DWSTR.inc and code to compare
+ 2 screenshots of the results on my old XP machine (tested with 1 000 000 steps)


I've noticed also some "not so clear" points about index for insert , get charcode...
i've modified them in my code, but not traced everything : sorry


José Roca

I have changed UBYTE to USHORT in the Char properties. Thanks for spotting it.

José Roca

My use of the @ and * operators in the CWSTR class is in accordance with what other languages such C++ do. For example, the MFC CComBstr class (C++ uses the & operator instead of @): https://msdn.microsoft.com/en-us/library/5s6et3yb.aspx

Quote
CComBSTR::operator &
Returns the address of the BSTR stored in the m_str member

When we do DIM cws AS CWSTR, cws is NOT the string variable, but the class. We can't pass a pointer to cws to an external third party function because that function has no idea of what a CWSTR is. We have to pass a pointer to the null terminated string that is stored in the m_pBuffer member. Therefore, the @ operator returns the address of the stored null terminated variable, not a pointer to the class.

The * operator acts like STRPTR, that is, it returns the m_pBuffer pointer (a pointer to the beginning of the string data).

Using the FB string data type, we can do:


DIM s AS STRING = "Test string"
DIM p AS ZSTRING PTR = STRPTR(s)
PRINT *p


Using CWSTR, we can do:


DIM cws AS CWSTR = "Test string"
DIM p AS WSTRING PTR = *cws
PRINT *p


That is similar to the first one, but using * instead of STRPTR.

But we can also use the ** shortcut:


DIM cws AS CWSTR = "Test string"
PRINT **cws


that does


DIM p AS WSTRING PTR = *cws
PRINT *p


in a single step.

I always have found ANNOYING to have to use VARPTR and STRPTR (too much typing for my taste), and was jealous of C++ programers that can use & and *.

Anyway, with the latest changes you can use LEFT, RIGHT, VAL and & without having to use **cws.


José Roca

The Capacity allows to preallocate the size of the buffer if you know the size of the result string or even an approximation to avoid multiple allocations.

The CodePage is not useless. Just because you can build a CWSTR concatenating strings with different code pages, doesn't mean that you have to. If you really need the use of a code page, you will have to use a function like AfxUcode, that allows to specify the code page. Free Basic should add an optional code page parameter to STR and WSTR.

José Roca

#59
These are the results of your test in my computer. As I said, using + or ** with CWSTR is faster than using &, so why not just use +?. The other differences are not significant: just a few milliseconds in a million of concatenations and assignments.