[PATCH v5] man/man7/pathname.7: Add file documenting format of pathnames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The goal of this new manual page is to help people create programs that
do the right thing even in the face of unusual paths.  The information
that I used to create this new manual page came from these sources:

• <https://unix.stackexchange.com/a/39179/316181>
• <https://sourceware.org/pipermail/libc-help/2024-August/006737.html>
• <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/ext4.h?h=v6.12.9#n2288>
• <man:unix(7)>
• <https://unix.stackexchange.com/q/92426/316181>

Signed-off-by: Jason Yundt <jason@jasonyundt.email>
---
Here’s what I changed from the previous version:

• I stopped saying that the kernel has a 255-byte limit on filenames.
  Florian was right, you can create files with names longer than 255
  characters.  I tried it, and I was able to create a file with a 355-character
  long name on both tmpfs and bcachefs.  This leaves us with one problem,
  though.  In <linux/limits.h>, NAME_MAX is defined as 255 and has a comment
  that says “chars in a file name” [1].  POSIX says that NAME_MAX is the
  “Maximum number of bytes in a filename (not including the terminating null of
  a filename string).”  Why is NAME_MAX set to 255 if you can have longer
  filenames?
• I from the Amiga filesystem back to the ext4 filesystem example.  The only
  reason why I had used the Amiga filesystem example was because I had
  previously thought that 255 bytes was the maximum for any filename,
  regardless of the filesystem.  I think that ext4 is better example because
  people are more likely to use an ext4 filesystem than an Amiga filesystem.
• I implemented all of Alex suggestions, except for the ones that
  no longer apply because they were suggestions for text that was deleted for
  other reasons.
• I added an example program.

[1]: <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/linux/limits.h?h=v6.12.9#n12>
[2]: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/limits.h.html#tag_14_26_03_02>

 man/man7/pathname.7 | 151 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)
 create mode 100644 man/man7/pathname.7

diff --git a/man/man7/pathname.7 b/man/man7/pathname.7
new file mode 100644
index 000000000..9545c3b07
--- /dev/null
+++ b/man/man7/pathname.7
@@ -0,0 +1,151 @@
+.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email)
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH pathname 7 (date) "Linux man-pages (unreleased)"
+.SH NAME
+pathname,
+filename
+\-
+how pathnames are encoded and interpreted
+.SH DESCRIPTION
+Some system calls allow you to pass a pathname as a parameter.
+When writing code that deals with pathnames,
+there are kernel-space requirements that you must comply with,
+and user-space requirements that you should comply with.
+.P
+The kernel stores pathnames as null-terminated byte sequences.
+The kernel has a few general rules that apply to all pathnames:
+.IP \[bu] 3
+The last byte in the sequence needs to be a null byte.
+.IP \[bu]
+Any other bytes in the sequence need to be non-null bytes.
+.IP \[bu]
+A 0x2F byte is always interpreted as a directory separator (/)
+and cannot be part of a filename.
+.IP \[bu]
+A pathname can be at most 4,096 bytes long.
+A pathname that’s longer than 4,096 bytes
+can be split into multiple smaller pathnames and opened piecewise using
+.BR openat (2).
+.P
+The kernel also has some rules that only apply in certain situations.
+Here are some examples:
+.IP \[bu] 3
+Filenames on the ext4 filesystem can be at most 30 bytes long.
+.IP \[bu]
+Filenames on the vfat filesystem cannot a
+0x22, 0x2A, 0x3A, 0x3C, 0x3E, 0x3F, 0x5C or 0x7C byte
+(", *, :, <, >, ?, \ or | in ASCII)
+unless the filesystem was mounted with iocharset set to something unusual.
+.IP \[bu]
+A UNIX domain socket’s sun_path can be at most 108 bytes long (see
+.BR unix (7)
+for details).
+.P
+User space treats pathnames differently.
+User space applications typically expect pathnames to use
+a consistent character encoding.
+For maximum interoperability, programs should use
+.BR nl_langinfo (3)
+to determine the current locale’s codeset.
+Paths should be encoded and decoded using the current locale’s codeset
+in order to help prevent mojibake.
+For maximum interoperability,
+programs and users should also limit
+the characters that they use for their own pathnames to characters in
+.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265
+the POSIX Portable Filename Character Set
+.UE .
+.SH EXAMPLES
+The following program demonstrates
+how to ensure that a pathname uses the proper encoding.
+The program starts with a UTF-32 encoded pathname.
+It then calls
+.BR nl_langinfo (3)
+in order to determine what the current locale’s codeset is.
+After that, it uses
+.BR iconv (3)
+to convert the UTF-32 encoded pathname into a locale codeset encoded pathname.
+Finally, the program uses the locale codeset encoded pathname to create
+a file that contains the message “Hello, world!”
+.SS Program source
+.\" SRC BEGIN (pathname_encoding_example.c)
+.EX
+#include <iconv.h>
+#include <langinfo.h>
+#include <locale.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <uchar.h>
+\&
+int
+main(void)
+{
+    if (setlocale(LC_ALL, "") == NULL) {
+        exit(EXIT_FAILURE);
+    }
+    char32_t *utf32_pathname = U"example";
+    size_t characters_in_pathname = (sizeof utf32_pathname) \- 1;
+    size_t bytes_in_locale_pathname =
+        characters_in_pathname * MB_CUR_MAX + 1;
+    // We use calloc() here to make sure that the output from iconv() is null
+    // terminated.
+    char *locale_pathname = calloc(1, bytes_in_locale_pathname);
+    if (locale_pathname == NULL) {
+        exit(EXIT_FAILURE);
+    }
+\&
+    iconv_t cd = iconv_open(nl_langinfo(CODESET), "UTF\-32");
+    if (cd == (iconv_t) \- 1) {
+        exit(EXIT_FAILURE);
+    }
+    char *inbuf = (char *) utf32_pathname;
+    size_t inbytesleft =
+        characters_in_pathname * (sizeof *utf32_pathname);
+    char *outbuf = locale_pathname;
+    size_t outbytesleft = bytes_in_locale_pathname;
+    size_t iconv_result;
+    // iconv() doesn’t necessarily convert everything all in one go, so we call
+    // it in a while loop just in case it takes multiple calls to finish
+    // converting everything.
+    while (inbytesleft > 0) {
+        iconv_result =
+            iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
+        if (iconv_result == \-1) {
+            exit(EXIT_FAILURE);
+        }
+    }
+    // This ensures that the conversion is 100% complete.  See iconv(3) for
+    // details.
+    iconv_result =
+        iconv(cd, NULL, &inbytesleft, &outbuf, &outbytesleft);
+    if (iconv_result == \-1) {
+        exit(EXIT_FAILURE);
+    }
+    if (iconv_close(cd) == \-1) {
+        exit(EXIT_FAILURE);
+    }
+\&
+    FILE *fp = fopen(locale_pathname, "w");
+    if (fp == NULL) {
+        exit(EXIT_FAILURE);
+    }
+    if (fputs("Hello, world!\\n", fp) == EOF) {
+        exit(EXIT_FAILURE);
+    }
+    if (fclose(fp) == EOF) {
+        exit(EXIT_FAILURE);
+    }
+\&
+    free(locale_pathname);
+    exit(EXIT_SUCCESS);
+}
+.EE
+.\" SRC END
+.SH SEE ALSO
+.BR open (2),
+.BR iconv (3),
+.BR nl_langinfo (3),
+.BR path_resolution (7),
+.BR mount (8)
-- 
2.47.1





[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux