Unicode usage precautions vs codepage

Started by Marc Pons, July 18, 2016, 07:25:05 AM

Previous topic - Next topic

José Roca

#15
This is what I'm going to put in the help file:

CWSTR ad CBSTR are classes to implement dynamic unicode data types. Free Basic has a dynamic string data type (STRING) and a fixed length unicode data type (WSTRING). What it lacks are dynamic unicode strings. CBSTR uses Windows BSTRrings and is slower than CWSTR, that uses a dynamic buffer. Therefore, its use should be reserved for COM programming and when needing to use unicode strings with embedded nulls.

CBSTR and CWSTR almost behave as if they were native data types, working directly with most intrinsic Free Basic string functions and operators, with some exceptions such LEFT, RIGHT and VAL, that need that you use a double indirection, i.e. LEFT(**cws, 10),  to pass a pointer to the string data. The reason that these functions don't work using,  e.g. LEFT(cws, 10), is because they don't generate temporaty strings and the operators of the CBSTR and CWSTR classes aren't called.

They work transparently with Free Basic native strings and literals, e.g.


DIM cws AS CWSTR = "One"
DIM s AS STRING = "Three"
cws = cws & " Two " & s
PRINT cws


They can be used like native strings to call Windows API functions, e.g.


PRIVATE FUNCTION AfxGetWindowText (BYVAL hwnd AS HWND) AS CWSTR
   DIM nLen AS LONG = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   DIM wszText AS CWSTR = SPACE(nLen + 1)
   SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, *wszText))
   RETURN wszText
END FUNCTION


For using them with languages that don't use the Latin alphabet, you can specify the code page (CP_UTF8 is also supported):


DIM cws AS CWSTR = CWSTR("Закрыть", 1251)   ' 1251, Russian code page
SetWindowText hwnd, cws


Important remark:  When returning a CBSTR or CWSTR as the result of a function, use always RETURN <variable name> and not FUNCTION = <variable name>. This is because the different behavior between RETURN and FUNCTION when returning temporaty types with constructors.

When using RETURN <variable name>, the compiler correctly calls the constructor of the temporary type, allowing the class to copy the data of the string to be returned, and then calls the destructor of the copied CBSTR or CWSTR when  the variable goes out of scope.

When using FUNCTION  = <variable name>, the compiler first calls the destructor of the string to be copied and then the constructor of new temporary type, making it impossible to the class to copy the data. Although it generally works with CBSTR strings, because Windows caches by default BSTRings that have been freed with SysFreeString, it will certainly crash when returning a CWSTR.