Re: [PATCH v5] man/man7/pathname.7: Add file documenting format of pathnames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

On Fri, Jan 17, 2025 at 08:02:03AM -0500, Jason Yundt wrote:
> The goal of this new manual page is to help people create programs that
> do the right thing even in the face of unusual paths.  The information
> that I used to create this new manual page came from these sources:
> 
> • <https://unix.stackexchange.com/a/39179/316181>
> • <https://sourceware.org/pipermail/libc-help/2024-August/006737.html>
> • <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/ext4/ext4.h?h=v6.12.9#n2288>
> • <man:unix(7)>
> • <https://unix.stackexchange.com/q/92426/316181>
> 
> Signed-off-by: Jason Yundt <jason@jasonyundt.email>
> ---
> Here’s what I changed from the previous version:
> 
> • I stopped saying that the kernel has a 255-byte limit on filenames.
>   Florian was right, you can create files with names longer than 255
>   characters.  I tried it, and I was able to create a file with a 355-character
>   long name on both tmpfs and bcachefs.  This leaves us with one problem,
>   though.  In <linux/limits.h>, NAME_MAX is defined as 255 and has a comment
>   that says “chars in a file name” [1].  POSIX says that NAME_MAX is the
>   “Maximum number of bytes in a filename (not including the terminating null of
>   a filename string).”  Why is NAME_MAX set to 255 if you can have longer
>   filenames?

There's fpathconf(3) which might give a different value.  I tend to use
the hardcoded macros in programs (although, I use PATH_MAX, since
usually I don't store single filenames).

I think for portability you should restrict yourself to creating stuff
shorter than the hard-coded macro, but accept up to the fpathconf(3)
value (similar to character sets).

You could test this in your system:

	alx@devuan:~/tmp/linux$ cat nm.c 
	#include <limits.h>
	#include <stdio.h>
	#include <unistd.h>

	int
	main(void)
	{
		printf("NAME_MAX: %d\n", NAME_MAX);
		printf("_PC_NAME_MAX: %ld\n", pathconf("/run/", _PC_NAME_MAX));
	}
	alx@devuan:~/tmp/linux$ gcc -Wall -Wextra nm.c 
	alx@devuan:~/tmp/linux$ ./a.out 
	NAME_MAX: 255
	_PC_NAME_MAX: 255
	alx@devuan:~/tmp/linux$ echo /run/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | wc
	      1       1     444
	alx@devuan:~/tmp/linux$ sudo touch /run/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
	[sudo] password for alx: 
	touch: cannot touch '/run/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa': File name too long

Curiously, my system is also limited to 255 for tmpfs filesystems but
yours is not?  I still get longer paths rejected.


> • I from the Amiga filesystem back to the ext4 filesystem example.  The only
>   reason why I had used the Amiga filesystem example was because I had
>   previously thought that 255 bytes was the maximum for any filename,
>   regardless of the filesystem.  I think that ext4 is better example because
>   people are more likely to use an ext4 filesystem than an Amiga filesystem.
> • I implemented all of Alex suggestions, except for the ones that
>   no longer apply because they were suggestions for text that was deleted for
>   other reasons.
> • I added an example program.
> 
> [1]: <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/uapi/linux/limits.h?h=v6.12.9#n12>
> [2]: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/limits.h.html#tag_14_26_03_02>
> 
>  man/man7/pathname.7 | 151 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 151 insertions(+)
>  create mode 100644 man/man7/pathname.7
> 
> diff --git a/man/man7/pathname.7 b/man/man7/pathname.7
> new file mode 100644
> index 000000000..9545c3b07
> --- /dev/null
> +++ b/man/man7/pathname.7
> @@ -0,0 +1,151 @@
> +.\" Copyright (C) 2025 Jason Yundt (jason@jasonyundt.email)
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH pathname 7 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +pathname,
> +filename
> +\-
> +how pathnames are encoded and interpreted
> +.SH DESCRIPTION
> +Some system calls allow you to pass a pathname as a parameter.
> +When writing code that deals with pathnames,
> +there are kernel-space requirements that you must comply with,
> +and user-space requirements that you should comply with.
> +.P
> +The kernel stores pathnames as null-terminated byte sequences.
> +The kernel has a few general rules that apply to all pathnames:
> +.IP \[bu] 3
> +The last byte in the sequence needs to be a null byte.
> +.IP \[bu]
> +Any other bytes in the sequence need to be non-null bytes.
> +.IP \[bu]
> +A 0x2F byte is always interpreted as a directory separator (/)
> +and cannot be part of a filename.
> +.IP \[bu]
> +A pathname can be at most 4,096 bytes long.
> +A pathname that’s longer than 4,096 bytes
> +can be split into multiple smaller pathnames and opened piecewise using
> +.BR openat (2).
> +.P
> +The kernel also has some rules that only apply in certain situations.
> +Here are some examples:
> +.IP \[bu] 3
> +Filenames on the ext4 filesystem can be at most 30 bytes long.
> +.IP \[bu]
> +Filenames on the vfat filesystem cannot a
> +0x22, 0x2A, 0x3A, 0x3C, 0x3E, 0x3F, 0x5C or 0x7C byte
> +(", *, :, <, >, ?, \ or | in ASCII)
> +unless the filesystem was mounted with iocharset set to something unusual.
> +.IP \[bu]
> +A UNIX domain socket’s sun_path can be at most 108 bytes long (see
> +.BR unix (7)
> +for details).
> +.P
> +User space treats pathnames differently.
> +User space applications typically expect pathnames to use
> +a consistent character encoding.
> +For maximum interoperability, programs should use
> +.BR nl_langinfo (3)
> +to determine the current locale’s codeset.
> +Paths should be encoded and decoded using the current locale’s codeset
> +in order to help prevent mojibake.
> +For maximum interoperability,
> +programs and users should also limit
> +the characters that they use for their own pathnames to characters in
> +.UR https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265
> +the POSIX Portable Filename Character Set
> +.UE .
> +.SH EXAMPLES
> +The following program demonstrates
> +how to ensure that a pathname uses the proper encoding.
> +The program starts with a UTF-32 encoded pathname.
> +It then calls
> +.BR nl_langinfo (3)
> +in order to determine what the current locale’s codeset is.
> +After that, it uses
> +.BR iconv (3)
> +to convert the UTF-32 encoded pathname into a locale codeset encoded pathname.
> +Finally, the program uses the locale codeset encoded pathname to create
> +a file that contains the message “Hello, world!”
> +.SS Program source
> +.\" SRC BEGIN (pathname_encoding_example.c)
> +.EX
> +#include <iconv.h>
> +#include <langinfo.h>
> +#include <locale.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <uchar.h>
> +\&
> +int
> +main(void)
> +{
> +    if (setlocale(LC_ALL, "") == NULL) {
> +        exit(EXIT_FAILURE);

I prefer showing an error message on errors.  For example:

	err(EXIT_FAILURE, "setlocale");

> +    }
> +    char32_t *utf32_pathname = U"example";

You probably wanted an array, not a pointer.

	char32_t  utf8_pathname[] = U"example";

> +    size_t characters_in_pathname = (sizeof utf32_pathname) \- 1;

`sizeof utf32_pathname` is 4.  You're taking the size of a pointer, not
of an array.  Also, sizeof gives you the number of bytes, not elements.
Also, the number of characters in a string is called 'length' (this is
standard nomenclature; see strlen(3)).  You probably wanted this:

	size_t  len = nelementsof(utf8_pathname) - 1;

Oh, I'm too far into an uncertain future, and we don't yet know how that
operator will be called.
<https://thephd.dev/the-big-array-size-survey-for-c>
For now, you'll want this:

	#define NELEMS(a)  (sizeof(a) / sizeof(a[0]))

	size_t  len = NELEMS(utf8_pathname) - 1;

> +    size_t bytes_in_locale_pathname =
> +        characters_in_pathname * MB_CUR_MAX + 1;

The number of bytes in an object is called 'size'.  This is also
standard nomenclature.

	size_t  size = len * MB_CUR_MAX + 1;


Have a lovely day!
Alex

> +    // We use calloc() here to make sure that the output from iconv() is null
> +    // terminated.

Doesn't iconv(3) terminate its output?  I've never used that API, so I
don't know.

> +    char *locale_pathname = calloc(1, bytes_in_locale_pathname);

I prefer it reversed:  we're allocating n bytes (of size 1), not
1 element of a weird size.  Remember the prototype is:

	void *calloc(size_t n, size_t size);

> +    if (locale_pathname == NULL) {
> +        exit(EXIT_FAILURE);
> +    }
> +\&
> +    iconv_t cd = iconv_open(nl_langinfo(CODESET), "UTF\-32");
> +    if (cd == (iconv_t) \- 1) {
> +        exit(EXIT_FAILURE);
> +    }
> +    char *inbuf = (char *) utf32_pathname;
> +    size_t inbytesleft =
> +        characters_in_pathname * (sizeof *utf32_pathname);
> +    char *outbuf = locale_pathname;
> +    size_t outbytesleft = bytes_in_locale_pathname;
> +    size_t iconv_result;
> +    // iconv() doesn’t necessarily convert everything all in one go, so we call
> +    // it in a while loop just in case it takes multiple calls to finish
> +    // converting everything.
> +    while (inbytesleft > 0) {
> +        iconv_result =
> +            iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
> +        if (iconv_result == \-1) {
> +            exit(EXIT_FAILURE);
> +        }
> +    }
> +    // This ensures that the conversion is 100% complete.  See iconv(3) for
> +    // details.
> +    iconv_result =
> +        iconv(cd, NULL, &inbytesleft, &outbuf, &outbytesleft);
> +    if (iconv_result == \-1) {
> +        exit(EXIT_FAILURE);
> +    }
> +    if (iconv_close(cd) == \-1) {
> +        exit(EXIT_FAILURE);
> +    }
> +\&
> +    FILE *fp = fopen(locale_pathname, "w");
> +    if (fp == NULL) {
> +        exit(EXIT_FAILURE);
> +    }
> +    if (fputs("Hello, world!\\n", fp) == EOF) {
> +        exit(EXIT_FAILURE);
> +    }
> +    if (fclose(fp) == EOF) {
> +        exit(EXIT_FAILURE);
> +    }
> +\&
> +    free(locale_pathname);
> +    exit(EXIT_SUCCESS);
> +}
> +.EE
> +.\" SRC END
> +.SH SEE ALSO
> +.BR open (2),
> +.BR iconv (3),
> +.BR nl_langinfo (3),
> +.BR path_resolution (7),
> +.BR mount (8)
> -- 
> 2.47.1
> 
> 

-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux