Hi Jose,
I have a generic function that is supposed to read all bytes from a disk file into memory. I then manipulate those bytes to determine if the file is ansi, utf-8, or utf-16. The result will be a string in txtBuffer that is UTF-8 encoded. Does OPEN and GET in this context into a STRING variable screw up the bytes with automatic conversions like it does when writing to a file??? Is there a better way to do this? Maybe straight api with CreateFile?
function GetFileToString( byref wszFilename as const wstring, byref txtBuffer as string, byval pDoc as clsDocument ptr) as boolean
' Load the entire file into a string
dim as long f = freefile
If Open( wszFilename for Binary Access Read As #f ) = 0 Then
If LOF(f) > 0 Then
txtBuffer = String(LOF(f), 0)
Get #f, , txtBuffer '<--- could this be a problem?
End If
else
return true ' error opening file
end if
close #f
' Check for BOM signatures
if left(txtBuffer, 3) = chr(&HEF, &HBB, &HBF) THEN
' UTF8 BOM encoded
pDoc->FileEncoding = FILE_ENCODING_UTF8_BOM
txtBuffer = mid(txtBuffer, 4) ' bypass the BOM
elseif left(txtBuffer, 2) = chr(&HFF, &HFE) THEN
' UTF16 BOM (little endian) encoded
pDoc->FileEncoding = FILE_ENCODING_UTF16_BOM
txtBuffer = mid(txtBuffer, 3) ' bypass the BOM
else
pDoc->FileEncoding = FILE_ENCODING_ANSI
END IF
select case pDoc->FileEncoding
case FILE_ENCODING_ANSI
' No conversion needed. clsDocument ApplyProperties will *not*
' set the editor to UTF8 code.
case FILE_ENCODING_UTF8_BOM
' No conversion needed. clsDocument ApplyProperties will set
' the editor to UTF8 code.
case FILE_ENCODING_UTF16_BOM
' Convert to UTF8 so it can display in the editor
' Need to pass a WSTRING pointer to the conversion function.
txtBuffer = UnicodeToUtf8( cast(WSTRING ptr, strptr(txtBuffer)) )
END select
function = false
END FUNCTION