On Wed, Jan 15, 2025 at 12:06:10AM +0100, Alejandro Colomar wrote: > Hmmm, yep, let's make it pathname(7). OK, I’ll submit a new version that uses pathname(7) as the title. > Makes sense. How about a null-terminated string? The term null-terminated string still has some of the problems that I mentioned earlier. Specifically, people think of null-terminated strings as sequences of characters. It’s easier to understand how the kernel handles paths if you think of paths as sequences of bytes, not as sequences of characters. Also, people typically make assumptions about the encoding of null-terminated strings in the C programming language. It’s reasonable to assume that a char * is encoded in the execution character set, that a wchar_t * is encoded in the wide execution character set, that a char8_t * is encoded in UTF-8, that a char16_t * is encoded in UTF-16 and that a char32_t * is encoded in UTF-32. Paths don’t necessarily have one character encoding, and their character encoding may not be any of those. > > I have a concern about programs failing hard when paths contain > > non-ASCII characters. I have a lot of songs and medleys saved on my > > computer. The paths for over 10,000 of them contain non-ASCII > > characters. Most of those non-ASCII characters come from Chinese, > > Japanese or Korean characters that are in the titles of songs or > > medleys. If programs failed hard on paths that contain non-ASCII > > characters, what impact would that have on my music collection? > > The core utils (e.g., rm(1) et al.) are nice and work well for arbitrary > characters, to allow you to fix them. But yeah, most high level > programs and (especially) scripts aren't so nice. Think for example of > makefiles, where handling files with spaces correctly is almost > impossible. I agree that the core utils work well with arbitrary paths. I’m not so sure that most high level programs and scripts don’t work well with spaces and non-ASCII characters. Most of the high level programs and scripts that I personally use work fine with paths that contain spaces and non-ASCII characters, but I don’t know if most programs and scripts in general work that well. I also agree that handling spaces correctly in makefiles is almost impossible which is why I don’t use makefiles for my own personal projects. That being said, I think that you misunderstood my two questions. You told me the current state of things. I’m not asking about the current state of things, I’m asking about a hypothetical future where programs started to “assume the Portable Filename Character Set (or at most some subset of ASCII), and fail hard outside of that”. If we start making that recommendation and programs start following that recommendation, then it sounds like I wouldn’t be able to do anything with a large part of my music collection, and it sounds like I wouldn’t be able to use the symbolic links that are in my /dev/disks/by-partlabel directory. Am I understanding your recommendation correctly?