Hi Jose,
I have a generic function that is supposed to read all bytes from a disk file into memory. I then manipulate those bytes to determine if the file is ansi, utf-8, or utf-16. The result will be a string in txtBuffer that is UTF-8 encoded. Does OPEN and GET in this context into a STRING variable screw up the bytes with automatic conversions like it does when writing to a file??? Is there a better way to do this? Maybe straight api with CreateFile?
function GetFileToString( byref wszFilename as const wstring, byref txtBuffer as string, byval pDoc as clsDocument ptr) as boolean
' Load the entire file into a string
dim as long f = freefile
If Open( wszFilename for Binary Access Read As #f ) = 0 Then
If LOF(f) > 0 Then
txtBuffer = String(LOF(f), 0)
Get #f, , txtBuffer '<--- could this be a problem?
End If
else
return true ' error opening file
end if
close #f
' Check for BOM signatures
if left(txtBuffer, 3) = chr(&HEF, &HBB, &HBF) THEN
' UTF8 BOM encoded
pDoc->FileEncoding = FILE_ENCODING_UTF8_BOM
txtBuffer = mid(txtBuffer, 4) ' bypass the BOM
elseif left(txtBuffer, 2) = chr(&HFF, &HFE) THEN
' UTF16 BOM (little endian) encoded
pDoc->FileEncoding = FILE_ENCODING_UTF16_BOM
txtBuffer = mid(txtBuffer, 3) ' bypass the BOM
else
pDoc->FileEncoding = FILE_ENCODING_ANSI
END IF
select case pDoc->FileEncoding
case FILE_ENCODING_ANSI
' No conversion needed. clsDocument ApplyProperties will *not*
' set the editor to UTF8 code.
case FILE_ENCODING_UTF8_BOM
' No conversion needed. clsDocument ApplyProperties will set
' the editor to UTF8 code.
case FILE_ENCODING_UTF16_BOM
' Convert to UTF8 so it can display in the editor
' Need to pass a WSTRING pointer to the conversion function.
txtBuffer = UnicodeToUtf8( cast(WSTRING ptr, strptr(txtBuffer)) )
END select
function = false
END FUNCTION
Here is code I adapted from Jose's AfxFileScan routine that appears to load the correctly load an entire file into a simple STRING variable.
DIM dwCount AS DWORD, dwFileSize AS DWORD, dwHighSize AS DWORD, dwBytesRead AS DWORD
DIM hFile AS HANDLE = CreateFileW(@wszFileName, GENERIC_READ, FILE_SHARE_READ, NULL, _
OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL)
IF hFile = INVALID_HANDLE_VALUE THEN return true
dwFileSize = GetFileSize(hFile, @dwHighSize)
txtBuffer = String(dwFileSize, 0)
DIM bSuccess AS LONG = ReadFile(hFile, strptr(txtBuffer), dwFileSize, @dwBytesRead, NULL)
CloseHandle(hFile)
IF bSuccess = FALSE THEN return true
There will be a problem if you were using UnicodeToUtf8(txtBuffer), but as you're passing a pointer with STRPTR I don't expect problems.
Can't say the same with OPEN(wszFilename). As OPEN does not support WSTRINGs, wszFilename will be converted to ansi. Therefore, won't work with unicode file names. This is why I'm using CreateFileW.