Hi Jose,
I have attached for your consideration a new Afx function that attempts to detect the encoding of a file. The attachment is a project with sample files that shows the function in action. I have had occasion to need to know the encoding of a text file more than just that it is unicode (AfxIsFileUnicode).
Here is the function:
'//
'// From the unicode.org FAQ:
'//
'// 00 00 FE FF UTF-32, big-endian
'// FF FE 00 00 UTF-32, little-endian
'// FE FF UTF-16, big-endian
'// FF FE UTF-16, little-endian
'// EF BB BF UTF-8
'//
'// Match the first x bytes of the file against the
'// Byte-Order-Mark (BOM) lookup table
'//
private function AfxGetFileEncoding( byref wszFilename as wstring ) as Integer
type _BOM_LOOKUP
bom as DWORD
nlen as ulong
ntype as Integer
end type
'// define longest headers first
static BOMLOOK(...) as _BOM_LOOKUP = _
{( &H0000FEFF, 4, NCP_UTF32 ), _
( &HFFFE0000, 4, NCP_UTF32BE ), _
( &HBFBBEF, 3, NCP_UTF8 ), _
( &HFFFE, 2, NCP_UTF16BE ), _
( &HFEFF, 2, NCP_UTF16 ), _
( 0, 0, NCP_ASCII ) _
}
DIM as DWORD dwBytesRead
DIM as HANDLE hFile
dim as BYTE header(4)
hFile = CreateFile( @wszFileName, GENERIC_READ, FILE_SHARE_READ, NULL, _
OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL)
IF hFile <> INVALID_HANDLE_VALUE THEN
if ReadFile( hFile, @header(0), 4, @dwBytesRead, NULL ) <> 0 then
for i as long = lbound(BOMLOOK) to ubound(BOMLOOK)
if dwBytesRead >= BOMLOOK(i).nLen then
if memcmp( @header(0), @BOMLOOK(i).bom, BOMLOOK(i).nlen ) = 0 then
return BOMLOOK(i).ntype
end if
end if
next
end if
CloseHandle(hFile)
end if
return NCP_ASCII '// default to ASCII
end function
Here is the example code (sample text files are also in the attachment):
#define unicode
#include once "Afx\AfxWin.inc"
'//
'// currently supported codepages
'//
#define NCP_ASCII 0
#define NCP_UTF8 1
#define NCP_UTF16 2
#define NCP_UTF16BE 3
#define NCP_UTF32 4
#define NCP_UTF32BE 5
#include once "AfxGetFileEncoding.inc"
' ========================================================================================
' MAIN PROGRAM ENTRY POINT
' ========================================================================================
' Test all of the sample files in the "samples" subfolder
DIM as HANDLE hSearch
dim AS WIN32_FIND_DATA WFD
dim as CWSTR wszFilename, wszFileType, wszPath
dim as Boolean IsUnicode
wszPath = AfxGetExePathName + "samples\"
hSearch = FindFirstFile( wszPath + "*.txt", @WFD )
IF hSearch <> INVALID_HANDLE_VALUE THEN
DO
IF (WFD.dwFileAttributes AND FILE_ATTRIBUTE_DIRECTORY) <> FILE_ATTRIBUTE_DIRECTORY THEN
wszFilename = wszPath & WFD.cFileName
select case AfxGetFileEncoding( wszFilename )
case NCP_UTF8
wszFileType = "NCP_UTF8": IsUnicode = true
case NCP_UTF16
wszFileType = "NCP_UTF16": IsUnicode = true
case NCP_UTF16BE
wszFileType = "NCP_UTF16BE": IsUnicode = true
case NCP_UTF32
wszFileType = "NCP_UTF32": IsUnicode = true
case NCP_UTF32BE
wszFileType = "NCP_UTF32BE": IsUnicode = true
case NCP_ASCII
wszFileType = "NCP_ASCII"
' If no BOM exists then it is possible that the file still contains
' unicode characters. We can test for that using AfxIsFileUnicode.
' We would only do this test in cases where for greater certainty
' that we need to know that the file contains unicode text. This is
' a more expensive test because the whole file has to be read into
' memory in order to be analyzed.
if AfxIsFileUnicode( wszFilename ) then IsUnicode = true
end select
? "Encoding: "; wszFileType, "IsUnicode: "; IsUnicode, "Filename: "; AfxStrPathName( "NAME", wszFilename)
END IF
LOOP WHILE FindNextFile(hSearch, @WFD)
FindClose(hSearch)
END IF
sleep
I'm having timeout problems again.
I fyou need that function, I will include it.
However, you're wrong in the assumption that AfxIsFileUnicode has to analyze the whole file. It just reads and analyzes the first 1024 bytes.
Ah yes right you are. I should have looked at the code more closely... it is the first 1K that is read not the full file greater than 1K.
Sorry that you are having timeout problems. I wish I knew what causes that.
I went into the forum admin settings and changed the following server setting:
Seconds before an unused session timeout
I changed the value to 14400 because I saw that mentioned somewhere on the web. Maybe this make a difference for you with that session timeout problem.
It is a strange problem. Yesterday I only was able to connect once. Today is woking fine. And it only happens with this site.
Thanks Jose, please let me know if accessing the forum continues to cause you problems. I'll try to search for more solutions if it does.