On Mon, Jan 27, 2025 at 04:53:10PM +0100, Alejandro Colomar wrote: > Right. But then, when do you need to do encoding? Personally, my preference is that programs use the locale’s codeset because I can override the locale codeset in the rare event that UTF-8 isn’t the correct option. In my previous example, I was able to set the LANG environment variable to jp_JP.SJIS so that I could run that old software in an environment where pathnames were encoded in Shift-JIS. If everything just always assumed a particular character encoding for pathnames, then I wouldn’t have been able to do that. That being said, I still don’t really know if that’s the best option. > Programs will either receive the pathname from the command line, or > read it from some file, or create one of its own. > > When creating a path of its own, it should restrict itself to the > Portable Filename Character Set, so encoding shouldn't be a problem. > > When reading pathnames, they'll already be encoded suitably. > > > > Instead, I think a good recommendation would be to behave in one of the > > > following ways: > > > > > > - Accept only the POSIX Portable Filename Character Set. > > > > This one isn’t quite a complete recommendation. The POSIX Portable > > Filename Character Set is just a character set. It’s not a character > > encoding. If we go with this one, then we would need to say something > > along the lines of “Encode and decode paths using ASCII and only accept > > characters that are in the POSIX Protable Filename Character Set.” > > > > > - Assume UTF-8, but reject control characters. > > > - Assume UTF-8. > > > > > - Accept anything, but reject control characters. > > > - Accept anything, just like the kernel. > > > > These last two also aren’t quite complete recommendations. If a GUI > > program wants to display a pathname on the screen, then what character > > encoding should it use when decoding the bytes? > > Just print them as they got in. No decoding. Send the raw bytes to > write(2) or printf(3) or whatever. I don’t think that printing is a good way for GUI applications to display text. I don’t normally run GUI applications in a terminal, so I’m not normally able to see a GUI application’s stdout or stderr. Most of the GUI applications that I use display pathnames as part of a larger window. In order to do that, the GUI application needs to know which characters the bytes in the pathname represent so that the GUI application can draw those characters on the screen.