The goal of this new manual page is to help people create programs that do the right thing even in the face of unusual paths. The information that I used to create this new manual page came from this Unix & Linux Stack Exchange answer [1] and from this Libc-help mailing list post [2]. [1]: <https://unix.stackexchange.com/a/39179/316181> [2]: <https://sourceware.org/pipermail/libc-help/2024-August/006737.html> Signed-off-by: Jason Yundt <jason@jasonyundt.email> --- man/man7/path-format.7 | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 man/man7/path-format.7 diff --git a/man/man7/path-format.7 b/man/man7/path-format.7 new file mode 100644 index 000000000..c3c01cbf5 --- /dev/null +++ b/man/man7/path-format.7 @@ -0,0 +1,41 @@ +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email) +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH PATH-FORMAT 7 (date) "Linux man-pages (unreleased)" +.SH NAME +path-format \- how pathnames are encoded and interpreted +.SH DESCRIPTION +Some system calls allow you to pass a pathname as a parameter. +When writing code that deals with paths, +there are kernel space requirements that you must comply with +and userspace requirements that you should comply with. +.P +The kernel stores paths as null-terminated byte sequences. +As far as the kernel is concerned, there are only three rules for paths: +.IP \[bu] +The last byte in the sequence needs to be a null. +.IP \[bu] +Any other bytes in the sequence need to not be null bytes. +.IP \[bu] +A 0x2F byte is always interpreted as a directory separator (/). +.P +This means that programs can technically do weird things +like create paths using random character encodings +or create paths without using any character encoding at all. +Filesystems may impose additional restrictions on paths, though. +For example, if you want to store a file on an ext4 filesystem, +then its filename can’t be longer than 255 bytes. +.P +Userspace treats paths differently. +Userspace applications typically expect paths to use +a consistent character encoding. +For maximum interoperability, programs should use +.BR nl_langinfo (3) +to determine the current locale’s codeset. +Paths should be encoded and decoded using the current locale’s codeset +in order to help prevent mojibake. +.SH SEE ALSO +.BR open (2), +.BR nl_langinfo (3), +.BR path_resolution (7) -- 2.47.0