Re: [PATCH v4] man/man7/pathname.7: Add file documenting format of pathnames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

On Wed, Jan 15, 2025 at 11:20:51AM -0500, Jason Yundt wrote:
> The goal of this new manual page is to help people create programs that
> do the right thing even in the face of unusual paths.  The information
> that I used to create this new manual page came from these sources:
> 
> • <https://unix.stackexchange.com/a/39179/316181>
> • <https://sourceware.org/pipermail/libc-help/2024-August/006737.html>
> • <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/linux/limits.h?h=v6.12.9#n12>
> • <https://docs.kernel.org/filesystems/affs.html#mount-options-for-the-affs>
> • <man:unix(7)>
> 
> Signed-off-by: Jason Yundt <jason@jasonyundt.email>
> ---
> Here’s what I changed from the previous version:

Thanks!  The page starts looking good.  I'll make some minor comments
below.

> • The title of the page is now “pathname(7)”.
> • The list of kernel rules now mentions that paths can’t be longer than
>   4,096 bytes (Thanks for mentioning this, Florian).
> • The list of kernel rules now mentions that filenames can’t be longer
>   than 255 bytes.
> • I replaced the ext4 filename limitation example with a Amiga filename
>   limitation example.  It no longer made sense to say that ext4 limited
>   filenames to 255 bytes now we’re saying that all filenames are limited
>   to 255 bytes.
> • I added UNIX domain sockets’s sun_path as an example of a situation
>   where the kernel puts additional limitations on paths (Thanks for
>   mentioning this, Florian).
> • I added additional sources to the commit message in order to account
>   for the new information added by this version.
> 
>  man/man7/pathname.7 | 61 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100644 man/man7/pathname.7
> 
> diff --git a/man/man7/pathname.7 b/man/man7/pathname.7
> new file mode 100644
> index 000000000..15ff98e15
> --- /dev/null
> +++ b/man/man7/pathname.7
> @@ -0,0 +1,61 @@
> +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email)
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH pathname 7 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +pathname \- how pathnames are encoded and interpreted

Maybe, since this also discusses filenames, we should use both names:

	.SH NAME
	filename,
	pathname
	\-
	...

> +.SH DESCRIPTION
> +Some system calls allow you to pass a pathname as a parameter.
> +When writing code that deals with paths,
> +there are kernel space requirements that you must comply with

s/kernel space/kernel-space/

since it works as an adjective.

also, I'd put a comma after that: s/$/,/

> +and userspace requirements that you should comply with.

s/userspace/user-space/

for similar reasons.

> +.P
> +The kernel stores paths as null-terminated byte sequences.
> +The kernel has a few general rules that apply to all paths:
> +.IP \[bu]

See man-pages(7):

   Lists
     There are different kinds of lists:

     [...]

     Bullet lists
            Elements  are preceded by bullet symbols (\[bu]).  Anything
            that doesn’t fit elsewhere is usually covered by this  type
            of list.

     [...]

     There should always be exactly 2 spaces between  the  list  symbol
     and  the  elements.   This  doesn’t  apply to "tagged paragraphs",
     which use the default indentation rules.

So, you'll need to use

	.IP \[bu] 3

in the first item (and only there; the following ones inherit the
value).

> +The last byte in the sequence needs to be a null byte.
> +.IP \[bu]
> +Any other bytes in the sequence need to be non-null bytes.
> +.IP \[bu]
> +A 0x2F byte is always interpreted as a directory separator (/).

How about adding this?:

	and cannot be part of a filename.

> +.IP \[bu]
> +A path can be at most 4,096 bytes long.

For self-consistency, let's use the same term all of the time: either
path or pathname.  Otherwise, a reader might think they are different
things.

For consistency with POSIX, let's say pathname, since that's what POSIX
uses:
<https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_254>

> +A path that’s longer than 4,096 bytes can be split into multiple smaller paths
> +and opened piecewise using
> +.BR openat (2).
> +.IP \[bu]
> +Filenames can be at most 255 bytes long.

For consistency with bullet one:

s/Filenames/A filename/

> +.P
> +The kernel also has some rules that only apply in certain situations.
> +Here are some examples:
> +.IP \[bu]
> +If you want to store a file on an Amiga filesystem,
> +then its filename can’t be longer than 30 bytes.

I would simplify and make it more consistent with the bullets above:

	-  Filenames on the Amiga filesystem can be at most 30 bytes long.

> +.IP \[bu]
> +If you want to store a file on a vfat filesystem,
> +then its filename can’t contain a 0x3A byte (: in ASCII)

Is that the only one?  I expect there are several characters that are
not allowed in vfat.

> +unless the filesystem was mounted with iocharset set to something unusual.
> +.IP \[bu]
> +A UNIX domain socket’s sun_path can be at most 108 bytes long (see
> +.BR unix (7)
> +for details).
> +.P
> +Userspace treats paths differently.

s/Userspace/User space/

> +Userspace applications typically expect paths to use

.

> +a consistent character encoding.
> +For maximum interoperability, programs should use
> +.BR nl_langinfo (3)
> +to determine the current locale’s codeset.
> +Paths should be encoded and decoded using the current locale’s codeset
> +in order to help prevent mojibake.

It might be interesting to add an example program.

> +For maximum interoperability,
> +programs and users should also limit
> +the characters that they use for their own paths to characters in
> +.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265
> +the POSIX Portable Filename Character Set
> +.UE .
> +.SH SEE ALSO
> +.BR open (2),
> +.BR nl_langinfo (3),
> +.BR path_resolution (7)

Also interesting:

	.BR mount (8)

(It talks about iocharset.)


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux