• Welcome to PlanetSquires Forums.
 

WinFBE 1.3.0 on GitHub (March 24, 2017)

Started by Paul Squires, March 24, 2017, 06:45:34 PM

Previous topic - Next topic

Paul Squires

Ok, looks like I have all of this figured out now. Working on incorporating it into the editor. Hope to post EXE's soon.
Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer

Paul Squires

...only one problem that I can not figure out why there is an error. For UTF-16 unicode files I need to convert the UTF-8 string in the Scintilla control to a UTF-16 string in order to output it to the disk file. The Scintilla control has been set to use SC_CP_UTF8.

Here is the code that I thought would work but does not. It returns 0 and GetLastError returns 1004 which indicates a bad parameter.

Jose, maybe you know why it does not work?


' ========================================================================================
' Maps UTF-8 string to Unicode (UTF-16)
' ========================================================================================
FUNCTION Utf8ToUnicode(BYREF sUtf8 AS STRING) AS STRING
   dim sUnicode AS STRING
   dim dwLen    as long

   dwLen =  MultiByteToWideChar(CP_UTF8, _                 'Set to UTF8
                       0,                       _          'Conversion type
                       cast(LPCSTR, STRPTR(sUtf8)), _      'UTF8 string to convert
                       LEN(sUtf8), _                       'Length of UTF8 string
                       0,  _
                       0)   
   sUnicode = string(dwLen * 2, 0)

' The conversion below returns 0 so need to use GetLastError
print   MultiByteToWideChar(CP_UTF8, _                 'Set to UTF8
                       MB_COMPOSITE, _          'Conversion type
                       cast(LPCSTR, STRPTR(sUtf8)), _     'UTF8 string to convert
                       LEN(sUtf8), _              'Lenght of UTF8 string
                       cast(LPWSTR, STRPTR(sUnicode)), _  'Unicode string
                       len(sUnicode))             'Lenght of Unicode buffer

print "GetLastError = "; GetLastError()   ' this is code 1004

   FUNCTION = sUnicode

END FUNCTION

Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer

José Roca

#32
Why don't you use AfxUcode? Even if your function worked, the returned value will be converted by FB to ansi automatically, since the return type is AS STRING.

If you don't want to use AfxUCode for some reason, you need to use and return a WSTRING pointer. First allocate a buffer and cast it to a WSTRING pointer, pass it to MutiByteToWideChar, return it as the result of the function and later deallocate the memory.

Also don't use MB_COMPOSITE.

Paul Squires

To be honest, I thought I had tried AfxUcode and it still failed.... I will try again. I believe I used something like:

PUT #f,, **AfxUcode(*psz, CP_UTF8)
Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer

José Roca

Well, it will not be converted to ansi because the returned variable is a string, not a wstring.

Try this function:


' ========================================================================================
PRIVATE FUNCTION AfxUcode2 (BYREF ansiStr AS CONST STRING) AS STRING
   DIM dwLen AS DWORD = MultiByteToWideChar(CP_UTF8, 0, STRPTR(ansiStr), LEN(ansiStr), NULL, 0)
   IF dwLen THEN
      DIM s AS STRING = SPACE(dwLen * 2)
      dwLen = MultiByteToWideChar(CP_UTF8, 0, STRPTR(ansiStr), LEN(ansiStr), CAST(WSTRING PTR, STRPTR(s)), dwLen * 2)
      IF dwLen THEN RETURN s
   END IF
END FUNCTION
' ========================================================================================


Paul Squires

This works:


               ' convert utf8 to utf16
               dim cws as CWSTR = AfxUcode(*psz, CP_UTF8)
               dim as byte ptr lpBuffer = Allocate(LEN(cws))
               lpBuffer = cws
               put #f, , lpBuffer[0], LEN(cws) * 2
               Deallocate(lpBuffer)
Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer

Paul Squires

Thanks Jose, your function worked perfectly :)

Here is the code I am using when saving the UTF-16 BOM encoded file.

         case FILE_ENCODING_UTF16_BOM
            ' Output the BOM first
            put #f, , chr(&HFF, &HFE)
            if sciCodePage = SC_CP_UTF8 THEN   
               ' convert utf8 to utf16
               put #f, , Utf8ToUnicode(*psz)
            else
               ' need to convert ansi to unicode
               put #f, , WStr(*psz)
            end if   

Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer

Paul Squires

New EXE's uploaded to GitHub. You will need to download the English.lang file as well because new top menu option "File Encoding" added under the Edit menu.

WinFBE will attempt to determine the type of file being loaded. You will notice that files with words such as Jose will load as UTF-8 because of the é character.

You can change file encoding either by clicking on the encoding label in the status bar or by selecting the encoding from the "Edit", "File Encoding" option.

Please let me know if you run into any problems.
Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer

José Roca

Quote
dim as byte ptr lpBuffer = Allocate(LEN(cws))
lpBuffer = cws
put #f, , lpBuffer[0], LEN(cws) * 2
Deallocate(lpBuffer)

I don't see the need to allocate and deallocate lpBuffer, since it is simply a matter of casting, i.e. lpBuffer = cws does not copy the data, but it simply does cast(ANY PTR, m_pBuffer).


dim as byte ptr lpBuffer = cws
put #f, , lpBuffer[0], LEN(cws) * 2


José Roca

Maybe this will also work:


put #f, , cast(BYTE PTR, cws.m_pBuffer)[0], LEN(cws) * 2



Paul Squires

Thanks Jose, I ended up using the function that you posted for the sake of simplicity. I have posted the new EXE's. I will now to see how ganlinlao makes out with the new code and functionality.
Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer

José Roca

Above all, don't use


dim as byte ptr lpBuffer = Allocate(LEN(cws))
lpBuffer = cws
put #f, , lpBuffer[0], LEN(cws) * 2
Deallocate(lpBuffer)


lpBuffer = cws will assign a pointer to the CWSTR buffer to lpBuffer. Therefore, Deallocate(lpBuffer) will try to deallocate the CWSTR buffer and the buffer allocated with Allocate won't be deallocated.

ganlinlao

Wow, I'm glad to see the code changes, the status bar of the ANSI and UTF8, Unicode can switch freely, and save the corresponding code file without any problems.
But the GetFileToString function has a problem. If an existing Unicode text file cannot be opened with a string variable, you must use Wstring or wtring ptr.
     if an unicode file:
         dim wsText as wstring*65536  'or  dim wsText as cwstr,  can not use aniStr as string
         if Open( wszFilename for input encoding "utf16" As #f ) = 0  then
               Get #f, , wsText
        end if
       
Thanks
ganlinlao

Paul Squires

Hi ganlinlao, thanks for the feedback. Here are the changes I am making to the code. If you can copy them into your code and test then that would be great. I will not be able to post new EXE's for a few more hours yet.
New version of UnicodeToUtf8 function:


' ========================================================================================
' Maps Unicode character string to a UTF-8 string.
' ========================================================================================
FUNCTION UnicodeToUtf8(byval pswzUnicode as wstring ptr) AS STRING
dim sUtf8 AS STRING

'Maps Unicode character string to a UTF-8 string.
sUtf8 = string(LEN(*pswzUnicode), 0)
WideCharToMultiByte(CP_UTF8, _                 'Set to UTF-8
                     0, _                       'Conversion type
                     cast(LPCWSTR, pswzUnicode), _  'Unicode string to convert
                     LEN(*pswzUnicode), _       'Length of Unicode string
                     cast(LPSTR, STRPTR(sUtf8)), _     'UTF-8 string
                     LEN(sUtf8), _              'Length of UTF-8 buffer
                     BYVAL 0, _                 'Invalid character replacement
                     BYVAL 0)                   'Replacement was used flag
FUNCTION = sUtf8

END FUNCTION


Replace this portion of GetFileToString (around line 300 of modRoutines.inc)


      case FILE_ENCODING_UTF16_BOM
         ' Convert to UTF8 so it can display in the editor
         ' Need to pass a WSTRING pointer to the conversion function.
         txtBuffer = UnicodeToUtf8( cast(WSTRING ptr, strptr(ansiStr)) )


Basically, I am mapping the raw binary received from loading the file to a WSTRING PTR. I am then feeding that pointer to the new UnicodeToUtf8.

Thanks,
Paul



Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer

Paul Squires

I have also added code that sets the buffer to dirty whenever a user changes from one file encoding to another. This ensures that the file will be saved with the new encoding.
Paul Squires
PlanetSquires Software
WinFBE Editor and Visual Designer