FF_STRING library

Started by James Fuller, July 04, 2016, 11:15:36 AM

Previous topic - Next topic

James Fuller

Attached are Pauls FF_xxxxxx string routines with Private added to each.
I also changed the order of the parameters in FF_Parse and FF_ParseAny so the delimiter is last, allowing for a default of ","

James

Paul Squires

Awesome, thanks James!

I am going to revisit this code again soon because I want to create one library that handles both ANSI and Unicode. Since working on the new editor I am using about 90% Unicode now and I need string handling routines that work well with Unicode also.

I am also thinking that rather than make each function PRIVATE, that maybe keep each on PUBLIC but compile each one as a separate object and assemble them into a static library? For larger multiple module projects this makes a lot of sense because then only one copy of the routine will be included rather than a copy for each independent module that gets compiled. I am still working my way through these types of questions because I have never had to deal with them in the past. With PB we just Include'd everything into the main application .BAS file rather than link separate standalone modules like is the PB (and C/C++) practice. I never used PB's SLL libraries either. The new editor will make creating standalone libraries composed of hundreds of individual source files (for granularity purposes) pretty easy. I can see something like this being quite useful for large code bases like Jose's AfxCtl library.
Paul Squires
PlanetSquires Software

José Roca

What the C programmers do is not always the best way. Many times they are constrained by the tools that they use. I'm tired of downloading C++ demos very difficult to follow because the code is split into dozens of files. And batch makefiles for this, batch makefiles for that, environment variables... Crazy!

I once tried PB's SLL system... A lot of work to put every procedure into a separate file and to prepare header files; having to rebuild everything every time you change a comma... And the final result was that it compiled slightly faster using includes that using SLLs! In my computer, it takes more time to compile using include files only the first time, then the files are cached and it compiles very fast.

I certainly I'm not going to split the code into hundreds of tiny .bas files. I would prefer not to use it and use SendMessageW.

James Fuller

Quote from: TechSupport on July 05, 2016, 10:27:55 AM
I am going to revisit this code again soon because I want to create one library that handles both ANSI and Unicode. Since working on the new editor I am using about 90% Unicode now and I need string handling routines that work well with Unicode also.

Paul,
  How will you do unicode with no native BSTR?

James

Paul Squires

Jose - right you are, and I'm going to stick with the INCLUDE approach as well. :)

James - Instead of returning strings as the result of a FUNCTION, I would design it such that the string IN and string OUT (and OUT string buffer length) would be passed to the function as parameters. The operation would occur and the result assigned to the OUT parameter (rather than as FUNCTION = strResult). It would be more like a SUB rather than a FUNCTION. I could do that for the UNICODE versions but still have the FUNCTION = strResult for the ANSI version. I would just use overloading to determine which version to call. I hate it that FB does not have a native built-in dynamic WSTRING.

Paul Squires
PlanetSquires Software

James Fuller

Paul,
  We REALLY NEED a native dynamic wide string type. I wonder if dkl can be bribed? :)
Your way is not acceptable to me. I prefer the BCX way with a static circular buffer.

James
Here is the FF_Remove for WStrings

#define unicode
#include Once "windows.bi"
#define FbTmpWStrSize 2048
'==============================================================================
Function fbTmpWStr(CharCount As Long) As WString Ptr
    Static As Long StrCnt
    Static As WString Ptr WStrFunc(FbTmpWStrSize)
    StrCnt = (StrCnt + 1) AND (FbTmpWStrSize -1)
    If WStrFunc(StrCnt) Then
        Deallocate WStrFunc(StrCnt)
        WStrFunc(StrCnt) = NULL
    EndIf
    CharCount+=1
    WStrFunc(StrCnt) = Allocate(CharCount * Len(WString))
    Function = WStrFunc(StrCnt)
End Function
'==============================================================================
Function FF_Remove(Byval wsMain As WString Ptr,Byval wsMatch As WString Ptr) As WString Ptr
    Dim As Integer i
    If Len(*wsMain) = 0 OR Len(*wsMatch) = 0 Then
        Return NULL
    EndIf
    Dim As WString Ptr wsp = fbTmpWStr(Len(*wsMain))
    *wsp = *wsMain
    Do
        i = Instr(*wsp,*wsMatch)
        If i > 0 Then
            *wsp = Left(*wsp,i-1) & Mid(*wsp,i + Len(*wsMatch))
        EndIf
    Loop Until i = 0   
    Function = wsp
End Function
'==============================================================================
Dim As WString *20 ws1 = "[]Hello[]"
Dim As WString Ptr wsp = FF_Remove(@ws1,"[]")
? *wsp

sleep




José Roca

Maybe one day you will let us know what WStrFunc does.

James Fuller

Jose,
  I thought it a bit self explanatory but ...?
In this case it is a static array of pointers to WStrings so you can return a WSTRING PTR from a function.
The array index increments and rolls over after 2048 in this case.

James

José Roca

A sort of string pool. Pray to not find one of these users that use strings of several gigabytes.

James Fuller

Jose,
  Yes I know but it's not the size it's the number in use at the same time.
And now for CBStr. This was a bit hairy and I'm not sure it's the best/only way to do it.
This has an option to delete and free the allocations.
James


#define unicode
#include "afx/CBstr.inc"
#define CBStrTmpSize 16
'==============================================================================
Function fbBstrTmp(ByVal DeleteFlag As Long = 0) As CBStr Ptr
    Static As CBStr Ptr CBStr_Tmp(CBStrTmpSize)
    Static As Long CBStr_Tmp_Count
    If DeleteFlag Then
        Dim i As Long
        For i = 1 To CBStr_Tmp_Count
            Delete CBStr_Tmp(i)
            CBStr_Tmp(i) = NULL
        Next
        CBStr_Tmp_Count = 0
    EndIf
    CBStr_Tmp_Count = (CBStr_Tmp_Count + 1) AND (CBStrTmpSize  -1)
    If CBStr_Tmp(CBStr_Tmp_Count) Then
        delete CBStr_Tmp(CBStr_Tmp_Count)
        CBStr_Tmp(CBStr_Tmp_Count) = NULL
    EndIf
    CBStr_Tmp(CBStr_Tmp_Count) = CPtr(CBStr Ptr,new CBStr Ptr)
    Function = CBStr_Tmp(CBStr_Tmp_Count)
End Function
'==============================================================================
Function FF_Remove(Byval cbsMain As CBStr Ptr,Byval cbsMatch As CBStr Ptr) As CBStr Ptr
    Dim As Long i
    If Len(*cbsMain) = 0 OR Len(*cbsMatch) = 0 Then
        Return NULL
    EndIf
    Dim As CBStr Ptr cbs = fbBStrTmp()
    Dim As WString Ptr ws1,ws2
    ws1 = **cbsMain
    ws2 = **cbsMatch
    Do
        i = Instr(*ws1,*ws2)
        If i > 0 Then
            *ws1 = Left(*ws1,i-1) & Mid(*ws1,i + Len(*ws2))
        EndIf
    Loop Until i = 0
    *cbs = *ws1
    Function = cbs
End Function
'==============================================================================
Function FbMain() As Long
    Dim As CBStr Ptr ws = new CBStr("[]Hello[]")
    Dim As CBStr Ptr ws1 = new CBStr("[]")
    Dim As CBStr Ptr wsp = FF_Remove(ws,ws1)
    ? *wsp
    Delete ws
    Delete ws1
   
    sleep

    Function = 0
End Function
End FbMain()

José Roca

#10
A little demo of what happens with FB unicode conversions:


pWindow.AddControl("Button", , IDCANCEL, "&Close", 350, 150, 75, 23)

DIM wsz AS WSTRING * 260 = WSTR("&Закрыть")
pWindow.AddControl("Button", , IDCANCEL, wsz, 350, 200, 75, 23)

DIM cb AS CBSTR = AfxUCode("&Закрыть", 1251)
pWindow.AddControl("Button", , IDCANCEL, cb, 350, 250, 75, 23)


Maybe using a Russian version of Windows, WSTR will work because the local ansi page will be Russian, but it doesn't work if used in a computer with a different local code page.

However, the version that uses AfxUCode("&Закрыть", 1251), should work in all systems.

José Roca

#11
> And now for CBStr. This was a bit hairy and I'm not sure it's the best/only way to do it.


' ========================================================================================
SUB FF_Remove(BYREF cbsMain AS CBSTR, BYREF cbsMatch AS CBSTR, BYREF cbsOut AS CBSTR)
   IF LEN(cbsMain) = 0 OR LEN(cbsMatch) = 0 OR VARPTR(cbsOut) = NULL THEN EXIT SUB
   cbsOut = cbsMain
   DIM i AS LONG
   DO
      i = INSTR(cbsOut, cbsMatch)
      IF i THEN
         cbsOut = LEFT(**cbsOut, i - 1) & MID(**cbsOut, i + LEN(cbsMatch))
      ENDIF
   LOOP UNTIL i = 0
END SUB
' ========================================================================================



DIM cbs1 AS CBSTR = CBSTR("[]Hello[]")
DIM cbs2 AS CBSTR = CBSTR("[]")
DIM cbsOut AS CBSTR
FF_Remove(cbs1, cbs2, cbsOut)
MessageBoxW 0, *cbsOut, "", MB_OK


We can also use


DIM cbsOut AS CBSTR
FF_Remove("[]Hello[]", "[]", cbsOut)
MessageBoxW 0, *cbsOut, "", MB_OK


José Roca

#12
Paul's suggestion:


' ========================================================================================
SUB FF_Remove(BYREF wszMain AS WSTRING, BYREF wszMatch AS WSTRING, BYREF wszOut AS WSTRING)
   IF LEN(wszMain) = 0 OR LEN(wszMatch) = 0 OR VARPTR(wszOut) = NULL THEN EXIT SUB
   wszOut = wszMain
   DIM i AS LONG
   DO
      i = INSTR(wszOut, wszMatch)
      IF i THEN
         wszOut = LEFT(wszOut, i - 1) & MID(wszOut, i + LEN(wszMatch))
      ENDIF
   LOOP UNTIL i = 0
END SUB
' ========================================================================================



DIM wszOut AS WSTRING * 260
FF_Remove("[]Hello[]", "[]", wszOut)
MessageBoxW 0, wszOut, "", MB_OK


The advantage of using the CBSTR version is that we don't need to know in advance the size of the out string because it uses a dynamic BSTR, or rather an AFX_BSTR, because BSTR is broken since the latest header's update (it no longer is a pointer to an unicode string, but a pointer to an unicode character).

What? Not a single PTR parameter? This must be no FreeBASIC :)


José Roca

Overloading the procedure, we can use CBSTRs or WSTRINGs:


' ========================================================================================
FUNCTION FF_Remove OVERLOAD (BYREF cbsMain AS CBSTR, BYREF cbsMatch AS CBSTR, BYREF cbsOut AS CBSTR) AS BOOLEAN
   IF LEN(cbsMain) = 0 OR LEN(cbsMatch) = 0 OR VARPTR(cbsOut) = NULL THEN EXIT FUNCTION
   cbsOut = cbsMain
   DIM i AS LONG
   DO
      i = INSTR(cbsOut, cbsMatch)
      IF i THEN
         cbsOut = LEFT(**cbsOut, i - 1) & MID(**cbsOut, i + LEN(cbsMatch))
         FUNCTION = TRUE
      ENDIF
   LOOP UNTIL i = 0
END FUNCTION
' ========================================================================================

' ========================================================================================
FUNCTION FF_Remove OVERLOAD (BYREF wszMain AS WSTRING, BYREF wszMatch AS WSTRING, BYREF wszOut AS WSTRING) AS BOOLEAN
   IF LEN(wszMain) = 0 OR LEN(wszMatch) = 0 OR VARPTR(wszOut) = NULL THEN EXIT FUNCTION
   wszOut = wszMain
   DIM i AS LONG
   DO
      i = INSTR(wszOut, wszMatch)
      IF i THEN
         wszOut = LEFT(wszOut, i - 1) & MID(wszOut, i + LEN(wszMatch))
         FUNCTION = TRUE
      ENDIF
   LOOP UNTIL i = 0
END FUNCTION
' ========================================================================================


I think that this is a good solution. If we know in advance the maximum length of the output string, it is more efficient to use a WSTRING because there won't be further allocations/deallocations of memory. If we don't know it, then we can use a CBSTR.

José Roca

#14
What we must not do is to use overloaded versions of the operators + and & (I removed them from the CBSTR class) because they generate temporary BSTRs that aren't freed. Instead, I'm using ** to point to the contents of the BSTR

LEFT(**cbsOut, i - 1) & MID(**cbsOut, i + LEN(cbsMatch))

that generates temporary WSTRINGs that the compiler frees automatically.