Hi Jason, On Mon, Jan 13, 2025 at 04:32:46PM -0500, Jason Yundt wrote: > The goal of this new manual page is to help people create programs that > do the right thing even in the face of unusual paths. The information > that I used to create this new manual page came from this Unix & Linux > Stack Exchange answer [1] and from this Libc-help mailing list post [2]. > > [1]: <https://unix.stackexchange.com/a/39179/316181> > [2]: <https://sourceware.org/pipermail/libc-help/2024-August/006737.html> > > Signed-off-by: Jason Yundt <jason@jasonyundt.email> > --- > man/man7/path-format.7 | 41 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 41 insertions(+) > create mode 100644 man/man7/path-format.7 > > diff --git a/man/man7/path-format.7 b/man/man7/path-format.7 > new file mode 100644 > index 000000000..c3c01cbf5 > --- /dev/null > +++ b/man/man7/path-format.7 > @@ -0,0 +1,41 @@ > +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email) > +.\" > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft > +.\" > +.TH PATH-FORMAT 7 (date) "Linux man-pages (unreleased)" > +.SH NAME > +path-format \- how pathnames are encoded and interpreted I would use path_format instead of path-format or PATH-FORMAT. > +.SH DESCRIPTION > +Some system calls allow you to pass a pathname as a parameter. > +When writing code that deals with paths, > +there are kernel space requirements that you must comply with > +and userspace requirements that you should comply with. > +.P > +The kernel stores paths as null-terminated byte sequences. > +As far as the kernel is concerned, there are only three rules for paths: > +.IP \[bu] > +The last byte in the sequence needs to be a null. > +.IP \[bu] > +Any other bytes in the sequence need to not be null bytes. ... need to be non-null bytes. seems easier to read. > +.IP \[bu] > +A 0x2F byte is always interpreted as a directory separator (/). > +.P > +This means that programs can technically do weird things > +like create paths using random character encodings > +or create paths without using any character encoding at all. > +Filesystems may impose additional restrictions on paths, though. > +For example, if you want to store a file on an ext4 filesystem, > +then its filename can’t be longer than 255 bytes. > +.P > +Userspace treats paths differently. > +Userspace applications typically expect paths to use > +a consistent character encoding. > +For maximum interoperability, programs should use > +.BR nl_langinfo (3) > +to determine the current locale’s codeset. I would say that for maximum interoperability one should self-limit to the POSIX Portable Filename Character Set: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265> Have a lovely night! Alex > +Paths should be encoded and decoded using the current locale’s codeset > +in order to help prevent mojibake. > +.SH SEE ALSO > +.BR open (2), > +.BR nl_langinfo (3), > +.BR path_resolution (7) > -- > 2.47.0 > -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature