DWString problem help

Started by fbfans, February 20, 2026, 12:59:54 PM

Previous topic - Next topic

fbfans

hi José!
I'm learning DWString and I have some confusion about its behavior with different source file encodings.
From the documentation, my understanding is:
DWString stores text internally in Unicode
We can get .ansi, .utf8, .ptr, etc., depending on the usage scenario
But I get different results when the source file is saved as ANSI vs UTF‑8:
1. When source file is ANSI
This code displays Chinese correctly:
dim as dwstring txt  = "围棋"
plutovg_text(pluto, txt.utf8, 32, 128)
plutovg_fill(pluto)
2. When source file is UTF‑8
The same direct assignment does NOT work correctly (garbled text):
dim as dwstring txt = "你好 围棋"
plutovg_text(pluto, txt.utf8, 330, 128)
plutovg_fill(pluto)
I have to use this instead to display properly:
DIM AS DWSTRING dtxt = dwstring("你好 围棋", CP_UTF8)
plutovg_text(pluto, dtxt.utf8, 330, 128)
plutovg_fill(pluto)
From an earlier example:
QuoteUtf8ToAnsi
DIM dws AS DWSTRING = DWSTRING(utf8Str, CP_UTF8)
DIM ansiStr AS STRING = dws
My questions:
Does utf8Str here mean a UTF‑8 string?
When the source file is saved as UTF‑8, the literal string inside "..." should already be UTF‑8.
So what exactly does DWSTRING(utf8Str, CP_UTF8) do internally?
Why do I need to explicitly specify CP_UTF8 when the file is already UTF‑8?

#define unicode
#include "plutovg.bi"
#include "afxnova/Dwstring.inc"
using afxnova

' draw vector font's on FBGFX Image

const as long iWidth  = 640
const as long iHeight = 480
const as double FONT_SIZE = 40

screenres iWidth,iHeight,32

var img = ImageCreate(iWidth, iHeight, 0)
dim as ubyte ptr imgPixels
dim as long imgWidth,imgHeight,imgBytes,imgPitch
ImageInfo(img, imgWidth, imgHeight, imgBytes, imgPitch, imgPixels)

var surface = plutovg_surface_create_for_data(imgPixels,imgWidth,imgHeight,imgPitch)
var pluto   = plutovg_create(surface)
var font    = plutovg_font_load_from_file("c:/windows/fonts/simhei.ttf",FONT_SIZE)
plutovg_set_font(pluto,font)
plutovg_set_source_rgb(pluto, 1, 1, 0)

dim as wstring * 8 txt = "你好 世界"

plutovg_text(pluto, txt, 20, 128)
'plutovg_fill(pluto)

DIM AS DWSTRING dtxt = dwstring("你好 围棋", CP_UTF8)
plutovg_text(pluto, dtxt.utf8, 330, 128)
plutovg_fill(pluto)

plutovg_text(pluto, txt, 32, 256)
plutovg_text(pluto, "你好 中国", 330, 256)
plutovg_stroke(pluto)

plutovg_save(pluto)
plutovg_text(pluto, txt, 90, 344)
plutovg_text(pluto, dtxt.utf8, 390, 344)
plutovg_rotate(pluto,.2)
plutovg_fill(pluto)
plutovg_restore(pluto)

put (0,0),img,PSET 'ALPHA
sleep
plutovg_surface_destroy(surface)
plutovg_destroy(pluto)
plutovg_font_destroy(font)

thanks!

José Roca

> Does utf8Str here mean a UTF‑8 string?
> The same direct assignment does NOT work correctly (garbled text):
> I have to use this instead to display properly:

As FreeBasic does not have a separate data type for utf-8 (it only has STRING), if you don't specify that it is utf with CP_UTF8, the constructor thinks that you're passing an ansi string and it does the conversión using the CP_ACP code page. Keep in mind that all the characters in an utf-8 string are ANSI characters.

The constructor calls this method:

PRIVATE FUNCTION DWSTRING.Add (BYREF ansiStr AS STRING, BYVAL nCodePage AS UINT = 0) AS BOOLEAN
   DWSTRING_DP("STRING - buffer: " & ..WSTR(m_pBuffer) & " - codepage: " & ..WSTR(nCodePage))
   IF .LEN(ansiStr) = 0 THEN RETURN FALSE
   ' // Create the wide string from the incoming ansi string
   DIM dwLen AS UINT, pbuffer AS ANY PTR
   DIM bRes AS BOOLEAN = TRUE   ' // assume success for now
   IF nCodePage = CP_UTF8 THEN   ' // check if it is really valid utf-8
      IF this.IsUtf8(ansiStr) = FALSE THEN nCodePage = CP_ACP
   END IF
   IF nCodePage = CP_UTF8 THEN
      dwLen = MultiByteToWideChar(CP_UTF8, 0, STRPTR(ansiStr), -1, NULL, 0)
      IF dwLen = 0 THEN RETURN FALSE
      pbuffer = Allocate(dwLen * 2)
      dwLen = MultiByteToWideChar(CP_UTF8, 0, STRPTR(ansiStr), -1, pbuffer, dwLen)
      IF dwLen = 0 THEN bRes = FALSE
   ELSE
      dwLen = MultiByteToWideChar(nCodePage, MB_PRECOMPOSED, STRPTR(ansiStr), -1, NULL, 0)
      IF dwLen = 0 THEN RETURN FALSE
      pbuffer = Allocate(dwLen * 2)
      dwLen = MultiByteToWideChar(nCodePage, MB_PRECOMPOSED, STRPTR(ansiStr), -1, pbuffer, dwLen)
      IF dwLen = 0 THEN bRes = FALSE
   END IF
   IF bRes = FALSE THEN
      IF pBuffer THEN Deallocate(pbuffer)
      RETURN bRes
   END IF
   ' // Copy the string into the buffer
   IF pbuffer THEN
      ' Copy the string into the buffer and update the length
      bRes = this.AppendBuffer(pbuffer, dwLen)
      ' // Deallocate the buffer
      IF pBuffer THEN Deallocate(pbuffer)
   END IF
   RETURN bRes
END FUNCTION

That checks if it is really utf-8 only if you have specified CP_UTF8.

   IF nCodePage = CP_UTF8 THEN   ' // check if it is really valid utf-8
      IF this.IsUtf8(ansiStr) = FALSE THEN nCodePage = CP_ACP
   END IF

With the utf-8 property you don't need to specify CP_UTF8 because it assumes that it is a utf-8 string.

BTW, you're saying "When source file is ANSI", but you can't have dim as wstring * 8 txt = "你好 世界" in an ANSI file. It must be utf-16 (Unicode). If you write "你好 世界" in Tiko in ANSI mode, it will become dim as wstring * 8 txt = "?? ??".


fbfans

Thank you for your reply; the explanation is clear and easy to understand.
Is this the reason for my problem?
When the file is in ANSI format, it reads text as ANSI by default, and DWString correctly converts ANSI to UTF-8.
When the file is in UTF-8 format, the text is already UTF-8, but since it is not specified as UTF-8, FreeBASIC still reads the UTF-8 string as ANSI by default, which results in garbled text because it cannot be parsed correctly.
Therefore, I need to use DWString("你好 围棋", CP_UTF8) to explicitly specify that the text is UTF-8, so that DWString can parse it correctly.
The core point is exactly what you said:
QuoteSince FreeBASIC does not provide a separate data type for UTF-8 (it only has STRING), if you do not specify that it is UTF‑8 or CP_UTF8, the constructor will assume you are passing an ANSI string and use the CP_ACP code page to process it. Keep in mind that all characters in a UTF-8 string are treated as ANSI characters.
Thanks again.

José Roca

Yes. An ansi string can contain text that use different code pages, not just utf-8. It can contain ansi text, utf-8, OEM and ansi with different code pages. DWSTRING supports them by allowing to specify the code page.  For example, the code page for utf8 is CP_UTF8 and for OEM it is CP_OEMCP. UTF-16 strings don't need conversion because there is a FreeBasic native type that supports it, WSTRING, which is a fixed-length unicode data type. DWSTRING extends WSTRING to provide methods that allow it to behave like a dynamic unicode string.