Re: [PATCH v2] man/man7/path-format.7: Add file documenting format of pathnames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 15, 2025 at 12:06:10AM +0100, Alejandro Colomar wrote:
> Hmmm, yep, let's make it pathname(7).

OK, I’ll submit a new version that uses pathname(7) as the title.

> Makes sense.  How about a null-terminated string?

The term null-terminated string still has some of the problems that I
mentioned earlier.  Specifically, people think of null-terminated
strings as sequences of characters.  It’s easier to understand how the
kernel handles paths if you think of paths as sequences of bytes, not as
sequences of characters.

Also, people typically make assumptions about the encoding of
null-terminated strings in the C programming language.  It’s reasonable
to assume that a char * is encoded in the execution character set, that
a wchar_t * is encoded in the wide execution character set, that a
char8_t * is encoded in UTF-8, that a char16_t * is encoded in UTF-16
and that a char32_t * is encoded in UTF-32.  Paths don’t necessarily
have one character encoding, and their character encoding may not be any
of those.

> > I have a concern about programs failing hard when paths contain
> > non-ASCII characters.  I have a lot of songs and medleys saved on my
> > computer.  The paths for over 10,000 of them contain non-ASCII
> > characters.  Most of those non-ASCII characters come from Chinese,
> > Japanese or Korean characters that are in the titles of songs or
> > medleys.  If programs failed hard on paths that contain non-ASCII
> > characters, what impact would that have on my music collection?
> 
> The core utils (e.g., rm(1) et al.) are nice and work well for arbitrary
> characters, to allow you to fix them.  But yeah, most high level
> programs and (especially) scripts aren't so nice.  Think for example of
> makefiles, where handling files with spaces correctly is almost
> impossible.

I agree that the core utils work well with arbitrary paths.  I’m not so
sure that most high level programs and scripts don’t work well with
spaces and non-ASCII characters.  Most of the high level programs and
scripts that I personally use work fine with paths that contain spaces
and non-ASCII characters, but I don’t know if most programs and scripts
in general work that well.  I also agree that handling spaces correctly
in makefiles is almost impossible which is why I don’t use makefiles for
my own personal projects.

That being said, I think that you misunderstood my two questions.  You
told me the current state of things.  I’m not asking about the current
state of things, I’m asking about a hypothetical future where programs
started to “assume the Portable Filename Character Set (or at most some
subset of ASCII), and fail hard outside of that”.  If we start making
that recommendation and programs start following that recommendation,
then it sounds like I wouldn’t be able to do anything with a large part
of my music collection, and it sounds like I wouldn’t be able to use the
symbolic links that are in my /dev/disks/by-partlabel directory.  Am I
understanding your recommendation correctly?




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux