[CC += Florian] Hi Jason, Florian, On Tue, Jan 14, 2025 at 07:54:45AM -0500, Jason Yundt wrote: > The goal of this new manual page is to help people create programs that > do the right thing even in the face of unusual paths. The information > that I used to create this new manual page came from this Unix & Linux > Stack Exchange answer [1], this Libc-help mailing list post [2] and this > line of code from the kernel [3]. > > [1]: <https://unix.stackexchange.com/a/39179/316181> > [2]: <https://sourceware.org/pipermail/libc-help/2024-August/006737.html> > [3]: <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/ext4.h?h=v6.12.9#n2288> > > Signed-off-by: Jason Yundt <jason@jasonyundt.email> > --- > Here’s what I changed from the previous version: > > • The title of the page is now “path_format”. It’s now always written in all lowercase. > • The second kernel rule now uses the suggested phrase “…need to be non-null bytes”. > • The manual page now recommends self-limiting to the POSIX Portable Filename Character Set. > • A missing word (byte) was added to the first kernel rule. > • I added a missing source to the commit message. > > man/man7/path_format.7 | 47 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > create mode 100644 man/man7/path_format.7 > > diff --git a/man/man7/path_format.7 b/man/man7/path_format.7 > new file mode 100644 > index 000000000..0a129eeba > --- /dev/null > +++ b/man/man7/path_format.7 > @@ -0,0 +1,47 @@ > +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email) > +.\" > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft > +.\" > +.TH path_format 7 (date) "Linux man-pages (unreleased)" > +.SH NAME > +path_format \- how pathnames are encoded and interpreted > +.SH DESCRIPTION > +Some system calls allow you to pass a pathname as a parameter. Maybe we should call the page pathname(7)? > +When writing code that deals with paths, > +there are kernel space requirements that you must comply with > +and userspace requirements that you should comply with. > +.P > +The kernel stores paths as null-terminated byte sequences. There's a specific term for this: string. Which means you don't need to explain so much about the null byte. It is understood that a string cannot contain null bytes (except for the terminator itself). > +As far as the kernel is concerned, there are only three rules for paths: > +.IP \[bu] > +The last byte in the sequence needs to be a null byte. > +.IP \[bu] > +Any other bytes in the sequence need to be non-null bytes. > +.IP \[bu] > +A 0x2F byte is always interpreted as a directory separator (/). > +.P > +This means that programs can technically do weird things > +like create paths using random character encodings > +or create paths without using any character encoding at all. I think I would skip this. It is implicit by the fact that the only forbidden character in a filename is '/'. > +Filesystems may impose additional restrictions on paths, though. > +For example, if you want to store a file on an ext4 filesystem, > +then its filename can’t be longer than 255 bytes. It might be good to mention that some filesystems restrict the valid characters in a filename. > +.P > +Userspace treats paths differently. > +Userspace applications typically expect paths to use > +a consistent character encoding. > +For maximum interoperability, programs should use > +.BR nl_langinfo (3) > +to determine the current locale’s codeset. Do we want to recommend that? IMHO, for maximum portability, programs should assume the Portable Filename Character Set (or at most some subset of ASCII), and fail hard outside of that, which will itself favor that users self-restrict to portable file names. Cheers, Alex > +Paths should be encoded and decoded using the current locale’s codeset > +in order to help prevent mojibake. > +For maximum interoperability, > +programs and users should also limit > +the characters that they use for their own paths to characters in > +.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265 > +the POSIX Portable Filename Character Set > +.UE . > +.SH SEE ALSO > +.BR open (2), > +.BR nl_langinfo (3), > +.BR path_resolution (7) > -- > 2.47.0 > -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature