OpenConnect development started in 2008, on a modern Linux box. I hadn't really operated a Linux box where the system locale was anything other than UTF-8 for a number of years by then, and it seemed reasonable enough for the charset handling to be fairly much non-existent, based on the assumption that "everything is UTF-8, all of the time". Now, however, OpenConnect has been ported to a number of systems where that assumption isn't valid, so I've made an attempt to deal with this. I think the Java, GNOME and KDE GUIs *were* all using UTF-8 anyway, although I'm not sure about Shimo (Fabian?). So all I've done so far is add conversion in main.c ? write_progress() will convert the string from UTF-8 to the current locale before printing it, while read_stdin() is now used for all user input and will convert *to* UTF-8. And the command line arguments are likewise converted to UTF-8 before being given to libopenconnect. Now it can use non-ASCII passwords in non-UTF-8 systems. And I even have things working under Windows having renamed my tun device to 'TAP?' and specifying '--interface TAP?' on the command line. The approach is still for *libopenconnect* to assume that everything is UTF-8, and so far nothing's changed for users of the library. However, that probably needs to change. At the very least, we need to convert file names from UTF-8 to legacy encoding before trying to open them. I think this was already broken for GNOME and KDE users where the strings (including filenames) will always have been UTF-8, and if they are actually stored on the file system using a legacy locale then the lack of conversion may already have been an issue. So I think I need an internal function open_utf8() which will convert a UTF-8 filename to legacy encoding before trying to open it. The legacy encoding will be automatically discovered by nl_langinfo(CODESET), and perhaps we'll want a new openconnect_set_legacy_charset() function to allow the user to override that. Does that sound reasonable? In the 20th century world of legacy locales, is it reasonable to assume that the filename used in open() is in the charset specified by LC_CTYPE? (Actually it's a per-filesystem thing, but systems which predate UTF-8 aren't going to be coping with that anyway, are they?) And is there anything *else* I've missed? I suppose the tun device name under Linux might also need to be converted, although Linux really *ought* to be using UTF-8. Perhaps the $CISCO_BANNER environment variable passed to the vpnc-script? Anything else? -- dwmw2 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5745 bytes Desc: not available URL: <http://lists.infradead.org/pipermail/openconnect-devel/attachments/20140730/65ed86d7/attachment-0001.bin>