Hi Jason, On Wed, Jan 15, 2025 at 11:20:51AM -0500, Jason Yundt wrote: > The goal of this new manual page is to help people create programs that > do the right thing even in the face of unusual paths. The information > that I used to create this new manual page came from these sources: > > • <https://unix.stackexchange.com/a/39179/316181> > • <https://sourceware.org/pipermail/libc-help/2024-August/006737.html> > • <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/linux/limits.h?h=v6.12.9#n12> > • <https://docs.kernel.org/filesystems/affs.html#mount-options-for-the-affs> > • <man:unix(7)> > > Signed-off-by: Jason Yundt <jason@jasonyundt.email> > --- > Here’s what I changed from the previous version: Thanks! The page starts looking good. I'll make some minor comments below. > • The title of the page is now “pathname(7)”. > • The list of kernel rules now mentions that paths can’t be longer than > 4,096 bytes (Thanks for mentioning this, Florian). > • The list of kernel rules now mentions that filenames can’t be longer > than 255 bytes. > • I replaced the ext4 filename limitation example with a Amiga filename > limitation example. It no longer made sense to say that ext4 limited > filenames to 255 bytes now we’re saying that all filenames are limited > to 255 bytes. > • I added UNIX domain sockets’s sun_path as an example of a situation > where the kernel puts additional limitations on paths (Thanks for > mentioning this, Florian). > • I added additional sources to the commit message in order to account > for the new information added by this version. > > man/man7/pathname.7 | 61 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 61 insertions(+) > create mode 100644 man/man7/pathname.7 > > diff --git a/man/man7/pathname.7 b/man/man7/pathname.7 > new file mode 100644 > index 000000000..15ff98e15 > --- /dev/null > +++ b/man/man7/pathname.7 > @@ -0,0 +1,61 @@ > +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email) > +.\" > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft > +.\" > +.TH pathname 7 (date) "Linux man-pages (unreleased)" > +.SH NAME > +pathname \- how pathnames are encoded and interpreted Maybe, since this also discusses filenames, we should use both names: .SH NAME filename, pathname \- ... > +.SH DESCRIPTION > +Some system calls allow you to pass a pathname as a parameter. > +When writing code that deals with paths, > +there are kernel space requirements that you must comply with s/kernel space/kernel-space/ since it works as an adjective. also, I'd put a comma after that: s/$/,/ > +and userspace requirements that you should comply with. s/userspace/user-space/ for similar reasons. > +.P > +The kernel stores paths as null-terminated byte sequences. > +The kernel has a few general rules that apply to all paths: > +.IP \[bu] See man-pages(7): Lists There are different kinds of lists: [...] Bullet lists Elements are preceded by bullet symbols (\[bu]). Anything that doesn’t fit elsewhere is usually covered by this type of list. [...] There should always be exactly 2 spaces between the list symbol and the elements. This doesn’t apply to "tagged paragraphs", which use the default indentation rules. So, you'll need to use .IP \[bu] 3 in the first item (and only there; the following ones inherit the value). > +The last byte in the sequence needs to be a null byte. > +.IP \[bu] > +Any other bytes in the sequence need to be non-null bytes. > +.IP \[bu] > +A 0x2F byte is always interpreted as a directory separator (/). How about adding this?: and cannot be part of a filename. > +.IP \[bu] > +A path can be at most 4,096 bytes long. For self-consistency, let's use the same term all of the time: either path or pathname. Otherwise, a reader might think they are different things. For consistency with POSIX, let's say pathname, since that's what POSIX uses: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_254> > +A path that’s longer than 4,096 bytes can be split into multiple smaller paths > +and opened piecewise using > +.BR openat (2). > +.IP \[bu] > +Filenames can be at most 255 bytes long. For consistency with bullet one: s/Filenames/A filename/ > +.P > +The kernel also has some rules that only apply in certain situations. > +Here are some examples: > +.IP \[bu] > +If you want to store a file on an Amiga filesystem, > +then its filename can’t be longer than 30 bytes. I would simplify and make it more consistent with the bullets above: - Filenames on the Amiga filesystem can be at most 30 bytes long. > +.IP \[bu] > +If you want to store a file on a vfat filesystem, > +then its filename can’t contain a 0x3A byte (: in ASCII) Is that the only one? I expect there are several characters that are not allowed in vfat. > +unless the filesystem was mounted with iocharset set to something unusual. > +.IP \[bu] > +A UNIX domain socket’s sun_path can be at most 108 bytes long (see > +.BR unix (7) > +for details). > +.P > +Userspace treats paths differently. s/Userspace/User space/ > +Userspace applications typically expect paths to use . > +a consistent character encoding. > +For maximum interoperability, programs should use > +.BR nl_langinfo (3) > +to determine the current locale’s codeset. > +Paths should be encoded and decoded using the current locale’s codeset > +in order to help prevent mojibake. It might be interesting to add an example program. > +For maximum interoperability, > +programs and users should also limit > +the characters that they use for their own paths to characters in > +.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265 > +the POSIX Portable Filename Character Set > +.UE . > +.SH SEE ALSO > +.BR open (2), > +.BR nl_langinfo (3), > +.BR path_resolution (7) Also interesting: .BR mount (8) (It talks about iocharset.) Cheers, Alex -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature