Hi Jason, On Fri, Jan 17, 2025 at 08:02:03AM -0500, Jason Yundt wrote: > The goal of this new manual page is to help people create programs that > do the right thing even in the face of unusual paths. The information > that I used to create this new manual page came from these sources: > > • <https://unix.stackexchange.com/a/39179/316181> > • <https://sourceware.org/pipermail/libc-help/2024-August/006737.html> > • <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/ext4.h?h=v6.12.9#n2288> > • <man:unix(7)> > • <https://unix.stackexchange.com/q/92426/316181> > > Signed-off-by: Jason Yundt <jason@jasonyundt.email> > --- > Here’s what I changed from the previous version: > > • I stopped saying that the kernel has a 255-byte limit on filenames. > Florian was right, you can create files with names longer than 255 > characters. I tried it, and I was able to create a file with a 355-character > long name on both tmpfs and bcachefs. This leaves us with one problem, > though. In <linux/limits.h>, NAME_MAX is defined as 255 and has a comment > that says “chars in a file name” [1]. POSIX says that NAME_MAX is the > “Maximum number of bytes in a filename (not including the terminating null of > a filename string).” Why is NAME_MAX set to 255 if you can have longer > filenames? There's fpathconf(3) which might give a different value. I tend to use the hardcoded macros in programs (although, I use PATH_MAX, since usually I don't store single filenames). I think for portability you should restrict yourself to creating stuff shorter than the hard-coded macro, but accept up to the fpathconf(3) value (similar to character sets). You could test this in your system: alx@devuan:~/tmp/linux$ cat nm.c #include <limits.h> #include <stdio.h> #include <unistd.h> int main(void) { printf("NAME_MAX: %d\n", NAME_MAX); printf("_PC_NAME_MAX: %ld\n", pathconf("/run/", _PC_NAME_MAX)); } alx@devuan:~/tmp/linux$ gcc -Wall -Wextra nm.c alx@devuan:~/tmp/linux$ ./a.out NAME_MAX: 255 _PC_NAME_MAX: 255 alx@devuan:~/tmp/linux$ echo /run/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | wc 1 1 444 alx@devuan:~/tmp/linux$ sudo touch /run/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa [sudo] password for alx: touch: cannot touch '/run/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa': File name too long Curiously, my system is also limited to 255 for tmpfs filesystems but yours is not? I still get longer paths rejected. > • I from the Amiga filesystem back to the ext4 filesystem example. The only > reason why I had used the Amiga filesystem example was because I had > previously thought that 255 bytes was the maximum for any filename, > regardless of the filesystem. I think that ext4 is better example because > people are more likely to use an ext4 filesystem than an Amiga filesystem. > • I implemented all of Alex suggestions, except for the ones that > no longer apply because they were suggestions for text that was deleted for > other reasons. > • I added an example program. > > [1]: <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/linux/limits.h?h=v6.12.9#n12> > [2]: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/limits.h.html#tag_14_26_03_02> > > man/man7/pathname.7 | 151 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 151 insertions(+) > create mode 100644 man/man7/pathname.7 > > diff --git a/man/man7/pathname.7 b/man/man7/pathname.7 > new file mode 100644 > index 000000000..9545c3b07 > --- /dev/null > +++ b/man/man7/pathname.7 > @@ -0,0 +1,151 @@ > +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email) > +.\" > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft > +.\" > +.TH pathname 7 (date) "Linux man-pages (unreleased)" > +.SH NAME > +pathname, > +filename > +\- > +how pathnames are encoded and interpreted > +.SH DESCRIPTION > +Some system calls allow you to pass a pathname as a parameter. > +When writing code that deals with pathnames, > +there are kernel-space requirements that you must comply with, > +and user-space requirements that you should comply with. > +.P > +The kernel stores pathnames as null-terminated byte sequences. > +The kernel has a few general rules that apply to all pathnames: > +.IP \[bu] 3 > +The last byte in the sequence needs to be a null byte. > +.IP \[bu] > +Any other bytes in the sequence need to be non-null bytes. > +.IP \[bu] > +A 0x2F byte is always interpreted as a directory separator (/) > +and cannot be part of a filename. > +.IP \[bu] > +A pathname can be at most 4,096 bytes long. > +A pathname that’s longer than 4,096 bytes > +can be split into multiple smaller pathnames and opened piecewise using > +.BR openat (2). > +.P > +The kernel also has some rules that only apply in certain situations. > +Here are some examples: > +.IP \[bu] 3 > +Filenames on the ext4 filesystem can be at most 30 bytes long. > +.IP \[bu] > +Filenames on the vfat filesystem cannot a > +0x22, 0x2A, 0x3A, 0x3C, 0x3E, 0x3F, 0x5C or 0x7C byte > +(", *, :, <, >, ?, \ or | in ASCII) > +unless the filesystem was mounted with iocharset set to something unusual. > +.IP \[bu] > +A UNIX domain socket’s sun_path can be at most 108 bytes long (see > +.BR unix (7) > +for details). > +.P > +User space treats pathnames differently. > +User space applications typically expect pathnames to use > +a consistent character encoding. > +For maximum interoperability, programs should use > +.BR nl_langinfo (3) > +to determine the current locale’s codeset. > +Paths should be encoded and decoded using the current locale’s codeset > +in order to help prevent mojibake. > +For maximum interoperability, > +programs and users should also limit > +the characters that they use for their own pathnames to characters in > +.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265 > +the POSIX Portable Filename Character Set > +.UE . > +.SH EXAMPLES > +The following program demonstrates > +how to ensure that a pathname uses the proper encoding. > +The program starts with a UTF-32 encoded pathname. > +It then calls > +.BR nl_langinfo (3) > +in order to determine what the current locale’s codeset is. > +After that, it uses > +.BR iconv (3) > +to convert the UTF-32 encoded pathname into a locale codeset encoded pathname. > +Finally, the program uses the locale codeset encoded pathname to create > +a file that contains the message “Hello, world!” > +.SS Program source > +.\" SRC BEGIN (pathname_encoding_example.c) > +.EX > +#include <iconv.h> > +#include <langinfo.h> > +#include <locale.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <uchar.h> > +\& > +int > +main(void) > +{ > + if (setlocale(LC_ALL, "") == NULL) { > + exit(EXIT_FAILURE); I prefer showing an error message on errors. For example: err(EXIT_FAILURE, "setlocale"); > + } > + char32_t *utf32_pathname = U"example"; You probably wanted an array, not a pointer. char32_t utf8_pathname[] = U"example"; > + size_t characters_in_pathname = (sizeof utf32_pathname) \- 1; `sizeof utf32_pathname` is 4. You're taking the size of a pointer, not of an array. Also, sizeof gives you the number of bytes, not elements. Also, the number of characters in a string is called 'length' (this is standard nomenclature; see strlen(3)). You probably wanted this: size_t len = nelementsof(utf8_pathname) - 1; Oh, I'm too far into an uncertain future, and we don't yet know how that operator will be called. <https://thephd.dev/the-big-array-size-survey-for-c> For now, you'll want this: #define NELEMS(a) (sizeof(a) / sizeof(a[0])) size_t len = NELEMS(utf8_pathname) - 1; > + size_t bytes_in_locale_pathname = > + characters_in_pathname * MB_CUR_MAX + 1; The number of bytes in an object is called 'size'. This is also standard nomenclature. size_t size = len * MB_CUR_MAX + 1; Have a lovely day! Alex > + // We use calloc() here to make sure that the output from iconv() is null > + // terminated. Doesn't iconv(3) terminate its output? I've never used that API, so I don't know. > + char *locale_pathname = calloc(1, bytes_in_locale_pathname); I prefer it reversed: we're allocating n bytes (of size 1), not 1 element of a weird size. Remember the prototype is: void *calloc(size_t n, size_t size); > + if (locale_pathname == NULL) { > + exit(EXIT_FAILURE); > + } > +\& > + iconv_t cd = iconv_open(nl_langinfo(CODESET), "UTF\-32"); > + if (cd == (iconv_t) \- 1) { > + exit(EXIT_FAILURE); > + } > + char *inbuf = (char *) utf32_pathname; > + size_t inbytesleft = > + characters_in_pathname * (sizeof *utf32_pathname); > + char *outbuf = locale_pathname; > + size_t outbytesleft = bytes_in_locale_pathname; > + size_t iconv_result; > + // iconv() doesn’t necessarily convert everything all in one go, so we call > + // it in a while loop just in case it takes multiple calls to finish > + // converting everything. > + while (inbytesleft > 0) { > + iconv_result = > + iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft); > + if (iconv_result == \-1) { > + exit(EXIT_FAILURE); > + } > + } > + // This ensures that the conversion is 100% complete. See iconv(3) for > + // details. > + iconv_result = > + iconv(cd, NULL, &inbytesleft, &outbuf, &outbytesleft); > + if (iconv_result == \-1) { > + exit(EXIT_FAILURE); > + } > + if (iconv_close(cd) == \-1) { > + exit(EXIT_FAILURE); > + } > +\& > + FILE *fp = fopen(locale_pathname, "w"); > + if (fp == NULL) { > + exit(EXIT_FAILURE); > + } > + if (fputs("Hello, world!\\n", fp) == EOF) { > + exit(EXIT_FAILURE); > + } > + if (fclose(fp) == EOF) { > + exit(EXIT_FAILURE); > + } > +\& > + free(locale_pathname); > + exit(EXIT_SUCCESS); > +} > +.EE > +.\" SRC END > +.SH SEE ALSO > +.BR open (2), > +.BR iconv (3), > +.BR nl_langinfo (3), > +.BR path_resolution (7), > +.BR mount (8) > -- > 2.47.1 > > -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature