Manifests

Started by Michael Stefanik, November 10, 2009, 02:05:22 PM

Previous topic - Next topic

Michael Stefanik

#15
For Paul and anyone else interested, I've attached functions that will convert between ANSI and UTF-8 strings. I'm making the presumption that FF3 was written in PowerBasic, and thought this might help.

Edit: Fixed a minor issue where more memory was being allocated for the ANSI string than was really necessary.
Mike Stefanik
sockettools.com

José Roca


José Roca


' ========================================================================================
' Converts an Ansi string to an UTF-8 encoded string.
' ========================================================================================
FUNCTION AnsiToUtf8 (BYVAL strAnsi AS STRING) AS STRING

   LOCAL i AS LONG                ' // Loop counter
   LOCAL strUtf8 AS STRING        ' // UTF-8 encoded string
   LOCAL idx AS LONG              ' // Position in the string
   LOCAL c AS LONG                ' // ASCII code
   LOCAL b2 AS LONG               ' // Second byte

   IF LEN(strAnsi) = 0 THEN EXIT FUNCTION

   ' // The maximum length of the translated string will be
   ' // twice the length of the original string.
   ' // We are pre-allocating the buffer for faster operation
   ' // than concatenating each character one by one.
   strUtf8 = SPACE$(LEN(strAnsi) * 2)

   ' // Intialize index position in the string buffer
   ' // used to store the UTF-8 encoded string
   idx = 1

   ' // Examine the contents of each character in the Ascii string
   FOR i = 1 TO LEN(strAnsi)
      ' // Get the Ascii code of the character
      c = ASC(MID$(strAnsi, i, 1))
      ' // If it is betwen 0 and 127...
      IF c < 128 THEN
         ' // ...we simply copy it to the string buffer...
         MID$(strUtf8, idx, 1) = MID$(strAnsi, i, 1)
         ' // ...and increase the position by 1.
         idx = idx + 1
      ELSE
         ' // We need to split the character into two characters.
         ' // For the second byte, we only need the lower six bits of the character,
         ' // and to ensure that the two upper bits will be 10 (in binary),
         ' // i.e. (00111111 AND xxxxxxxx) OR 10000000
         b2 = (c AND &H3F) OR &H80
         ' // For the first byte, we need only the upper two bits from the character,
         ' // and to ensure that the three upper bits will be 110 (in binary).
         SHIFT RIGHT c, 6
         c = c OR &HC0
         ' // Copy the bytes to the buffer string and increase the index position by 2.
         MID$(strUtf8, idx, 2) = CHR$(c, b2)
         idx = idx + 2
      END IF
   NEXT

   ' // Return the encoded string
   FUNCTION = LEFT$(strUtf8, idx - 1)

END FUNCTION
' ========================================================================================

' ========================================================================================
' Converts an UTF-8 encoded string to an Ansi string.
' ========================================================================================
FUNCTION Utf8ToAnsi (BYVAL strUtf8 AS STRING) AS STRING

   LOCAL i AS LONG                ' // Loop counter
   LOCAL strAnsi AS STRING       ' // Ascii string
   LOCAL idx AS LONG              ' // Position in the string
   LOCAL c AS LONG                ' // ASCII code
   LOCAL b2 AS LONG               ' // Second byte
   LOCAL fSkipChar AS LONG        ' // Flag

   IF LEN(strUtf8) = 0 THEN EXIT FUNCTION

   ' // The maximum length of the translated string will be
   ' // the same as the length of the original string.
   ' // We are pre-allocating the buffer for faster operation
   ' // than concatenating each character one by one.
   strAnsi = SPACE$(LEN(strUtf8))

   ' // Intialize index position in the string buffer
   ' // used to store the converted Ascii string
   idx = 1

   ' // Examine the contents of each character in the UTF-8 encoded string
   FOR i = 1 TO LEN(strUtf8)
      ' // If fSkipChar is set we have to skip this character
      IF fSkipChar THEN
         fSkipChar = 0
         ITERATE FOR
      END IF
      ' // Get the Ascii code of the character
      c = ASC(MID$(strUtf8, i, 1))
      ' // If it is betwen 0 and 127...
      IF c < 128 THEN
         ' // ...we simply copy it to the string buffer...
         MID$(strAnsi, idx, 1) = MID$(strUtf8, i, 1)
         ' // ...and increase the position by 1.
         idx = idx + 1
      ELSEIF c < 224 THEN
         ' // We need to join this byte and the next byte.
         b2 = ASC(MID$(strUtf8, i + 1, 1))
         IF b2 > 127 THEN
            c = (c - 192) * 64 + (b2 - 128)
            MID$(strAnsi, idx, 1) = CHR$(c)
            ' // Set the flag to skip the next character
            fSkipChar = %TRUE
            ' // Increase the position by 1.
            idx = idx + 1
         END IF
      END IF
   NEXT

   ' // Return the encoded string
   FUNCTION = LEFT$(strAnsi, idx - 1)

END FUNCTION
' ========================================================================================


Paul Squires

I just heard back from Jean-Pierre. The manifest problem is all fixed now. The utf-8 function to convert the ansi strings fixed the problem. Thanks guys.
Paul Squires
PlanetSquires Software