> Is there a regex package in GLib that is capable of searching/matching wide > characters? No. GLib's string APIs (except for the explicit wide char conversion ones) handle just plain char strings, generally assumed to be UTF-8 in cases where it matters. But if you know that a file is in wide characters (i.e. UTF-16LE on Windows), then you can use g_utf16_to_utf8() to convert its contents to UTF-8 once you have read it in (or mapped it into memory). > for future reference, I would like to try and track down a wchar_t > implementation of regex functions. I was hoping GLib already had them, but > perhaps I am wrong. Wide characters (wchar_t), although per se part of standard C, in practise are used mostly in Windows-specific programming. On Unix and Linux, especially in free software circles, encoding Unicode as UTF-8 is the rule, and thus normal string functions and coding conventions can be used. (One notable exception is OpenOffice.org, which used UTF-16 internally also on Unix. Dunno about Mozilla, for instance.) So in software being mainly developed by people using Linux, you seldom see wchar_t. (Note that the wchar_t type in gcc on Linux is 32 bits, not 16 bits like on Windows, so it actually can represent all characters in current Unicode. On Windows when you use wchar_t strings you still have to take into consideration that some characters will actually take a pair of wchar_ts, so in practise the kind of code you end up writing doesn't differ significantly from code that handles UTF-8 or other variable-length encodings anyway. It is a question of handling Unicode characters as 1..4 chars or 1..2 wchar_ts. You can't just pretend each wchar_t is a freestanding character, and that wchar_t strings can be split at any place with each part being valid. Surrogate pairs do exist.) --tml _______________________________________________ gtk-list mailing list gtk-list@xxxxxxxxx http://mail.gnome.org/mailman/listinfo/gtk-list