Re: Knowing the length(convert(username using windows_1251_to_utf8))

"Alexander Farber" <alexander.farber@xxxxxxxxx> · Thu, 11 Jan 2007 12:37:32 +0100

Hi Martijn,

On 1/11/07, Martijn van Oosterhout <kleptog@xxxxxxxxx> wrote:
If you need the string in UTF-8, why not just set the "client_encoding"
to "utf8" and then the server will only send you strings in utf8, not
conversion necessary.

actually you are right, because I need all my data in UTF8 anyway
(for a web flash client). So I've followed your advice and added:

  PQsetClientEncoding(conn, "UTF8")

and now my program works same, but without that convert().

> Is there please a way to know the length of the utf8 data?
> (I'm using a fixed char array in my C program)

UTF-8 always variable length, I think up to 4 bytes per character.
Maybe you should n't be using a fixed-length array?

Ok I'll go for the 4 times bigger fixed array for now,
because I'd like to keep my webchat-like app quick.

In your next email you ask:
> Can I still be sure that the data returned in the
> convert(username using windows_1251_to_utf8)
> column will be 0-terminated or should I fetch
> the data length using PQgetlength and maintain
> that value in my C-program?

In the client end (as long you're not doing binary transfers) the
strings are always null terminated.

May I ask you an off-topic question? I've read several
docs on Unicode, but they are difficult to understand.

Do you think that an UTF8 string will ever have a 0 byte
inside of it? Or is it safe to continue using strlen/strlcpy/strcmp
on the UTF8 values I'll be fetching from my database?

Regards
Alex

PS: Using postgresql-server-8.1.4 on OpenBSD 4.0-stable

--
http://preferans.de