AfxStr - Unicode String Functions

Started by José Roca, July 07, 2016, 01:35:30 AM

Previous topic - Next topic

José Roca

#15
A demonstration of what I mean.

Currently, to implement the function below we have to declare an out WSTRING parameter and dim a WSTRING of enough size and pass it. The problem is that WSTRINGs are of fixed size, so if the WSTRING has not enough size we have a problem, and if we dimension them with big sizes, we waste memory.

We can also declare the return type of the function as WSTRING PTR and return a pointer to a dynamically allocated buffer. The problem is that we can't pass this pointer directly to another function without creating a memory leak. We have to assign the pointer to a variable, pass the variable and free it.

Using CBSTR to return the result...


' ========================================================================================
' Gets the text of a window.
' Note: GetWindowText cannot retrieve the text of a control in another application.
' ========================================================================================
FUNCTION AfxGetWindowTextW (BYVAL hwnd AS HWND) AS CBSTR
   DIM nLen AS LONG, pbuffer AS WSTRING PTR
   nLen = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   pbuffer = CAllocate(nLen + 1, 2)
   nLen = SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, pbuffer))
   DIM cbText AS CBSTR = TYPE(CBSTR(LEFT(*pbuffer, nLen)))
   Deallocate pbuffer
   FUNCTION = cbText
END FUNCTION
' ========================================================================================


we can use


MessageBoxW hwndMain, AfxGetWindowTextW(hwndMain), "", MB_OK


And we don't have to worry about freeing the memory.

(I have added an operator to allow to pass the handle of the BSTR hosted in the CBSTR class without having to use *).


' ========================================================================================
OPERATOR CBStr.CAST () AS ANY PTR
   OPERATOR =  CAST(ANY PTR, m_bstr)
END OPERATOR
' ========================================================================================


José Roca

#16
Instead of allocating a buffer with CAllocate, we can also use:


' ========================================================================================
FUNCTION AfxGetWindowTextW (BYVAL hwnd AS HWND) AS CBSTR
   DIM nLen AS LONG = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   DIM cbText AS CBSTR = TYPE<CBSTR>(SPACE(nLen + 1))
   nLen = SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, *cbText))
   FUNCTION = LEFT(**cbText, nLen)
END FUNCTION
' ========================================================================================


In this small function, it doesn't matter if we use one method or another. The problem is when we have to do many concatenations.

José Roca

#17
Also, as in FB a BSTR is defined as a pointer to a WSTRING, we can't have an overloaded method for WSTRINGs and another for BSTRs, so there is a problem in the LET operator to know if it is a pointer to one or the another. I have needed to use a trick.


' ========================================================================================
OPERATOR CBStr.Let (BYREF bstrHandle AS AFX_BSTR)
   IF bstrHandle = NULL THEN EXIT OPERATOR
   ' Free the current OLE string
   IF m_bstr THEN SysFreeString(m_bstr)
   ' Detect if the passed handle is an OLE string
   ' If it is an OLE string it must have a descriptor; otherwise, don't
   ' Get the length looking at the descriptor
   DIM res AS DWORD = PEEK(DWORD, CAST(ANY PTR, bstrHandle) - 4) \ 2
   ' If the retrieved length is the same that the returned by LEN, then it must be an OLE string
   IF res = .LEN(*bstrHandle) THEN
      ' Attach the passed handle to the class
      m_bstr = bstrHandle
   ELSE
      ' Allocate an OLE string with the contents of the string pointed by bstrHandle
      m_bstr = SysAllocString(*bstrHandle)
   END IF
END OPERATOR
' ========================================================================================


Paul Squires

Quote from: Jose Roca on July 07, 2016, 10:33:42 PM
Instead of allocating a buffer with CAllocate, we can also use:


' ========================================================================================
FUNCTION AfxGetWindowTextW (BYVAL hwnd AS HWND) AS CBSTR
   DIM nLen AS LONG, pbuffer AS WSTRING PTR
   nLen = SendMessageW(hwnd, WM_GETTEXTLENGTH, 0, 0)
   DIM cbText AS CBSTR = TYPE<CBSTR>(SPACE(nLen + 1))
   nLen = SendMessageW(hwnd, WM_GETTEXT, nLen + 1, cast(LPARAM, *cbText))
   FUNCTION = LEFT(**cbText, nLen)
END FUNCTION
' ========================================================================================


In this small function, it doesn't matter if we use one method or another.

I like this approach a bit better.

Quote
The problem is when we have to do many concatenations.

I think we should do some tests to determine if there is actually a huge speed problem or not. It may be insignificant or immaterial.
Paul Squires
PlanetSquires Software

Paul Squires

Quote from: TechSupport on July 07, 2016, 10:50:11 PM
I think we should do some tests to determine if there is actually a huge speed problem or not. It may be insignificant or immaterial.

I am going to do these tests first thing in the morning.
Paul Squires
PlanetSquires Software

Paul Squires

My initial tests show that it gets exponentially slower as the number concatenations increases. A thousand or two is not bad but once enter 10,000 it starts to increase quickly by orders of magnitude. More tests in the morning.
Paul Squires
PlanetSquires Software

Paul Squires

Hi Jose,

I am off to bed now but I wanted to post this article that I just started to read. The gist seems to be that SysReAllocStringLen is so much faster than creating a new string and copying but parts into it for an append.
http://technolog.nl/blogs/eprogrammer/archive/2006/07/25/Boost-BSTR-performance-for-free_2C00_-by-3000_2500_.aspx
Paul Squires
PlanetSquires Software

José Roca

Don't know if SysReallocStringLen will make a big difference. SysReallocString certainly not.

Source code of SysReallocString:


/******************************************************************************
*      SysReAllocString    [OLEAUT32.3]
*
* Change the length of a previously created BSTR.
*
* PARAMS
*  old [I/O] BSTR to change the length of
*  str [I]   New source for pbstr
*
* RETURNS
*  Success: 1
*  Failure: 0.
*
* NOTES
*  See BSTR(), SysAllocStringStringLen().
*/
INT WINAPI SysReAllocString(LPBSTR old,LPCOLESTR str)
{
   /*
   * Sanity check
   */
   if (old==NULL)
   return 0;

   /*
   * Make sure we free the old string.
   */
   SysFreeString(*old);

   /*
   * Allocate the new string
   */
   *old = SysAllocString(str);

   return 1;
}


SysReallocString does the same that I'm doing: freeing the old string and allocating a new one.

José Roca

#23
SysReallocStringLen is different:


/******************************************************************************
*             SysReAllocStringLen   [OLEAUT32.5]
*
* Change the length of a previously created BSTR.
*
* PARAMS
*  old [O] BSTR to change the length of
*  str [I] New source for pbstr
*  len [I] Length of oleStr in wide characters
*
* RETURNS
*  Success: 1. The size of pbstr is updated.
*  Failure: 0, if len >= 0x80000000 or memory allocation fails.
*
* NOTES
*  See BSTR(), SysAllocStringByteLen().
*  *old may be changed by this function.
*/
int WINAPI SysReAllocStringLen(BSTR* old, const OLECHAR* str, unsigned int len)
{
   /* Detect integer overflow. */
   if (len >= ((UINT_MAX-sizeof(WCHAR)-sizeof(DWORD))/sizeof(WCHAR)))
   return FALSE;

   if (*old!=NULL) {
      DWORD newbytelen = len*sizeof(WCHAR);
      bstr_t *old_bstr = bstr_from_str(*old);
      bstr_t *bstr = CoTaskMemRealloc(old_bstr, bstr_alloc_size(newbytelen));

      if (!bstr) return FALSE;

      *old = bstr->u.str;
      bstr->size = newbytelen;
      /* The old string data is still there when str is NULL */
      if (str && old_bstr->u.str != str) memmove(bstr->u.str, str, newbytelen);
      bstr->u.str[len] = 0;
   } else {
      *old = SysAllocStringLen(str, len);
   }

   return TRUE;
}


José Roca

#24
The BSTR cache can be disabled calling SetOaNoCache.

See: https://msdn.microsoft.com/en-us/library/windows/desktop/ms644360(v=vs.85).aspx

The cache has been the cause of some posts in the PB Forum saying that PB WSTRINGs leaked because they didn't see a reduction of the memory consumption in the Task Manager after the variable went out of scope.

In fact, after calling SysFreeString, you can still access the BSTR if the cache is not disabled, e.g.


DIM bs AS BSTR
bs = SysAllocString("pepe")
MessageBoxW 0, bs, "1", MB_OK
SysFreeString bs
MessageBoxW 0, bs, "2", MB_OK


José Roca

Source code of SysFreeString (Wine version):


/******************************************************************************
*      SysFreeString   [OLEAUT32.6]
*
* Free a BSTR.
*
* PARAMS
*  str [I] BSTR to free.
*
* RETURNS
*  Nothing.
*
* NOTES
*  See BSTR.
*  str may be NULL, in which case this function does nothing.
*/
void WINAPI SysFreeString(BSTR str)
{
     bstr_cache_entry_t *cache_entry;
     bstr_t *bstr;
     IMalloc *malloc = get_malloc();
     SIZE_T alloc_size;

     if(!str)
         return;

     bstr = bstr_from_str(str);

     alloc_size = IMalloc_GetSize(malloc, bstr);
     if (alloc_size == ~0UL)
         return;

     cache_entry = get_cache_entry_from_alloc_size(alloc_size);
     if(cache_entry) {
         unsigned i;

         EnterCriticalSection(&cs_bstr_cache);

         /* According to tests, freeing a string that's already in cache doesn't corrupt anything.
          * For that to work we need to search the cache. */
         for(i=0; i < cache_entry->cnt; i++) {
             if(cache_entry->buf[(cache_entry->head+i) % BUCKET_BUFFER_SIZE] == bstr) {
                 WARN_(heap)("String already is in cache!\n");
                 LeaveCriticalSection(&cs_bstr_cache);
                 return;
             }
         }

        if(cache_entry->cnt < sizeof(cache_entry->buf)/sizeof(*cache_entry->buf)) {
            cache_entry->buf[(cache_entry->head+cache_entry->cnt) % BUCKET_BUFFER_SIZE] = bstr;
            cache_entry->cnt++;

            if(WARN_ON(heap)) {
                unsigned n = (alloc_size-FIELD_OFFSET(bstr_t, u.ptr))/sizeof(DWORD);
                for(i=0; i<n; i++)
                    bstr->u.dwptr[i] = ARENA_FREE_FILLER;
            }

            LeaveCriticalSection(&cs_bstr_cache);
            return;
        }

        LeaveCriticalSection(&cs_bstr_cache);
    }

    CoTaskMemFree(bstr);
}


José Roca

SysAllocString simply delegates it to SysAllocStringLen:


BSTR WINAPI SysAllocString(LPCOLESTR str)
{
    if (!str) return 0;

    /* Delegate this to the SysAllocStringLen method. */
    return SysAllocStringLen(str, lstrlenW(str));
}


And this is the source code for SysAllocStringLen:


/******************************************************************************
*             SysAllocStringLen     [OLEAUT32.4]
*
* Create a BSTR from an OLESTR of a given wide character length.
*
* PARAMS
*  str [I] Source to create BSTR from
*  len [I] Length of oleStr in wide characters
*
* RETURNS
*  Success: A newly allocated BSTR from SysAllocStringByteLen()
*  Failure: NULL, if len is >= 0x80000000, or memory allocation fails.
*
* NOTES
*  See BSTR(), SysAllocStringByteLen().
*/
BSTR WINAPI SysAllocStringLen(const OLECHAR *str, unsigned int len)
{
    bstr_t *bstr;
    DWORD size;

    /* Detect integer overflow. */
    if (len >= ((UINT_MAX-sizeof(WCHAR)-sizeof(DWORD))/sizeof(WCHAR)))
    return NULL;

    TRACE("%s\n", debugstr_wn(str, len));

    size = len*sizeof(WCHAR);
    bstr = alloc_bstr(size);
    if(!bstr)
        return NULL;

    if(str) {
        memcpy(bstr->u.str, str, size);
        bstr->u.str[len] = 0;
    }else {
        memset(bstr->u.str, 0, size+sizeof(WCHAR));
    }

    return bstr->u.str;
}


José Roca

And this is the source for SysAllocStringByteLen:


/******************************************************************************
*             SysAllocStringByteLen     [OLEAUT32.150]
*
* Create a BSTR from an OLESTR of a given byte length.
*
* PARAMS
*  str [I] Source to create BSTR from
*  len [I] Length of oleStr in bytes
*
* RETURNS
*  Success: A newly allocated BSTR
*  Failure: NULL, if len is >= 0x80000000, or memory allocation fails.
*
* NOTES
*  -If len is 0 or oleStr is NULL the resulting string is empty ("").
*  -This function always NUL terminates the resulting BSTR.
*  -oleStr may be either an LPCSTR or LPCOLESTR, since it is copied
*  without checking for a terminating NUL.
*  See BSTR.
*/
BSTR WINAPI SysAllocStringByteLen(LPCSTR str, UINT len)
{
    bstr_t *bstr;

    /* Detect integer overflow. */
    if (len >= (UINT_MAX-sizeof(WCHAR)-sizeof(DWORD)))
    return NULL;

    bstr = alloc_bstr(len);
    if(!bstr)
        return NULL;

    if(str) {
        memcpy(bstr->u.ptr, str, len);
        bstr->u.ptr[len] = 0;
    }else {
        memset(bstr->u.ptr, 0, len+1);
    }
    bstr->u.str[(len+sizeof(WCHAR)-1)/sizeof(WCHAR)] = 0;

    return bstr->u.str;
}


José Roca

This is the source of alloc_bstr, called by some functions:


static bstr_t *alloc_bstr(size_t size)
{
    bstr_cache_entry_t *cache_entry = get_cache_entry(size);
    bstr_t *ret;

    if(cache_entry) {
        EnterCriticalSection(&cs_bstr_cache);

        if(!cache_entry->cnt) {
            cache_entry = get_cache_entry(size+BUCKET_SIZE);
            if(cache_entry && !cache_entry->cnt)
                cache_entry = NULL;
        }

        if(cache_entry) {
            ret = cache_entry->buf[cache_entry->head++];
            cache_entry->head %= BUCKET_BUFFER_SIZE;
            cache_entry->cnt--;
        }

        LeaveCriticalSection(&cs_bstr_cache);

        if(cache_entry) {
            if(WARN_ON(heap)) {
                size_t fill_size = (FIELD_OFFSET(bstr_t, u.ptr[size])+2*sizeof(WCHAR)-1) & ~(sizeof(WCHAR)-1);
                memset(ret, ARENA_INUSE_FILLER, fill_size);
                memset((char *)ret+fill_size, ARENA_TAIL_FILLER, bstr_alloc_size(size)-fill_size);
            }
            ret->size = size;
            return ret;
        }
    }

    ret = CoTaskMemAlloc(bstr_alloc_size(size));
    if(ret)
        ret->size = size;
    return ret;
}


José Roca

The code for bstr_from_str, called by some functions, is:


static inline bstr_t *bstr_from_str(BSTR str)
{
    return CONTAINING_RECORD(str, bstr_t, u.str);
}


But I don't find the code for CONTAINING_RECORD, that apparently is a macro.