On Mon, Jan 27, 2025 at 06:37:40PM +0100, Alejandro Colomar wrote: > [CC += наб] > > Hi Jason, > > On Mon, Jan 27, 2025 at 12:14:43PM -0500, Jason Yundt wrote: > > On Mon, Jan 27, 2025 at 04:53:10PM +0100, Alejandro Colomar wrote: > > > Right. But then, when do you need to do encoding? > > > > Personally, my preference is that programs use the locale’s codeset > > because I can override the locale codeset in the rare event that UTF-8 > > isn’t the correct option. In my previous example, I was able to set the > > LANG environment variable to jp_JP.SJIS so that I could run that old > > software in an environment where pathnames were encoded in Shift-JIS. > > If everything just always assumed a particular character encoding for > > pathnames, then I wouldn’t have been able to do that. > > But if the program handles arbitrary strings, just like the kernel does, > that would work too. > > > > > > - Accept anything, but reject control characters. > > > > > - Accept anything, just like the kernel. > > > > > > > > These last two also aren’t quite complete recommendations. If a GUI > > > > program wants to display a pathname on the screen, then what character > > > > encoding should it use when decoding the bytes? > > > > > > Just print them as they got in. No decoding. Send the raw bytes to > > > write(2) or printf(3) or whatever. > > > > I don’t think that printing is a good way for GUI applications to > > display text. I don’t normally run GUI applications in a terminal, so > > I’m not normally able to see a GUI application’s stdout or stderr. Most > > of the GUI applications that I use display pathnames as part of a larger > > window. In order to do that, the GUI application needs to know which > > characters the bytes in the pathname represent so that the GUI > > application can draw those characters on the screen. > > I would do in a GUI exactly the same as what command-line programs do: > pass the raw string to whatever API prints them. If the string makes > sense in the current locale, it will be shown nicely. If it doesn't > make sense, it will display weird characters, but that's not a terrible > issue. Just run again with the appropriate locale. OK, but how does that API figure out what characters to display? What character encoding should that API use when drawing the characters? I think that it’s OK to replace the current recommendation, but pathname(7) should really explain how such an API would figure out what characters need to be drawn on the screen. > For example, in the git repository of the Linux man-pages project, there > are commits authored by наб <nabijaczleweli@xxxxxxxxxxxxxxxxxx>. > Whenever I see the git-log(1) in one of my systems with the C locale, I > see weird characters. I just need to re-run with the C.UTF-8 locale. > > But it handles the bytes correctly, even if they don't make sense to the > system. If git(1) failed whenever a string doesn't make sense in the > current locale, the repo would be corrupted sooner than later.