[PATCH v1] man/man7/pathname.7: Pathnames are opaque C strings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 27, 2025 at 07:27:59PM +0100, наб wrote:
> Skimming the thread: UNIX paths are sequences of non-NUL bytes.
>
> It is never correct to expect to be able to have a (parse, unparse)
> operation pair for which unparse(parse(x)) = x for path x.
>
> It's obviously wrong to reject a pathname just because you dont like it.
>
> Thus, when displaying a path, either (a) dump it directly to the output
> (the user has configured their display device to understand the paths they use),
> or if that's not possible (b) setlocale(LC_ALL, "") + mbrtowc() loop
> and render the result (applying usual ?/� substitutions for mbrtowc()
> errors makes sense here).
>
> There are very few operations on paths that are actually reasonable
> to do, ever; those are: appending stuff, prepending stuff
> (this is just appending stuff with the arguments backwards),
> and cleaving at /es;
> the "stuff" better be copied whole-sale from some other path
> or an unprocessed argument (or, sure, the PFCS).
>
> If you're getting bytes to append to a path, do that directly.
>
> If you're getting characters to append to a path,
> then wctomb(3) is the only non-invalid solution,
> since that (obviously) turns characters into bytes in the current
> locale, which (ex def) is the operation desired.
>
> I don't understand what the UTF-32 dance is supposed to be.
>
> If you're recommending transcoding paths, don't.
>
> To re-iterate: paths are not character sequences.
> They do not represent characters.
> You can't meaningfully coerce them thusly without loss of precision
> (this is ok to do for display! and nothing else).
> If at any point you find yourself turning wchar_t -> char
> you are doing something wrong;
> if you find yourself doing char -> wchar_t for anything beside display
> you should probably reconsider.
>
> This is different under Win32 of course. But that concerns us naught.

Suggested-by: наб <nabijaczleweli@xxxxxxxxxxxxxxxxxx>
Cc: Jason Yundt <jason@jasonyundt.email>
Cc: Florian Weimer <fweimer@xxxxxxxxxx>
Cc: "G. Branden Robinson" <branden@xxxxxxxxxx>
Signed-off-by: Alejandro Colomar <alx@xxxxxxxxxx>
---

Hi наб!

Thanks for the detailed response.  I applied this patch based on it.
Does it sound good to you?  Please review.

Have a lovely day!
Alex


 man/man7/pathname.7 | 87 ++-------------------------------------------
 1 file changed, 2 insertions(+), 85 deletions(-)

diff --git a/man/man7/pathname.7 b/man/man7/pathname.7
index 59650ef6e..996436606 100644
--- a/man/man7/pathname.7
+++ b/man/man7/pathname.7
@@ -17,7 +17,7 @@ .SH DESCRIPTION
 The kernel stores pathnames as C strings,
 that is,
 sequences of non-null bytes terminated by a null byte.
-The kernel has a few general rules that apply to all pathnames:
+There are a few general rules that apply to all pathnames:
 .IP \[bu] 3
 The last byte in the sequence needs to be a null byte.
 .IP \[bu]
@@ -59,17 +59,8 @@ .SH DESCRIPTION
 .P
 Some filesystems or APIs may apply further restrictions,
 such as requiring shorter filenames,
-or restricting the allowed characters in a filename.
+or restricting the allowed bytes in a filename.
 .P
-User-space programs treat pathnames differently.
-They typically expect pathnames to
-use a consistent character encoding.
-For maximum interoperability,
-programs should use
-.BR nl_langinfo (3)
-to determine the current locale's codeset.
-Pathnames should be encoded and decoded using the current locale's codeset
-in order to help prevent mojibake.
 For maximum interoperability,
 programs and users should also
 limit the characters that they use for their own pathnames to
@@ -77,83 +68,9 @@ .SH DESCRIPTION
 .UR https://pubs.opengroup.org/\:onlinepubs/\:9799919799/\:basedefs/\:V1_chap03.html#tag_03_265
 Portable Filename Character Set
 .UE .
-.SH EXAMPLES
-The following program demonstrates
-how to ensure that a pathname uses the proper encoding.
-The program starts with a UTF-32 encoded pathname.
-It then calls
-.BR nl_langinfo (3)
-in order to determine what the current locale's codeset is.
-After that, it uses
-.BR iconv (3)
-to convert the UTF-32-encoded pathname into a locale-codeset-encoded pathname.
-Finally,
-the program uses the locale-codeset-encoded pathname
-to create a file that contains the message \[lq]Hello, world!\[rq].
-.SS Program source
-.\" SRC BEGIN (pathname_encoding_example.c)
-.EX
-#include <err.h>
-#include <iconv.h>
-#include <langinfo.h>
-#include <locale.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <uchar.h>
-\&
-#define NELEMS(a)  (sizeof(a) / sizeof(a[0]))
-\&
-int
-main(void)
-{
-    char      *locale_pathname;
-    char      *in, *out;
-    FILE      *fp;
-    size_t    size;
-    size_t    inbytes, outbytes;
-    iconv_t   cd;
-    char32_t  utf32_pathname[] = U"María";
-\&
-    if (setlocale(LC_ALL, "") == NULL)
-        err(EXIT_FAILURE, "setlocale");
-\&
-    size = NELEMS(utf32_pathname) * MB_CUR_MAX;
-    locale_pathname = malloc(size);
-    if (locale_pathname == NULL)
-        err(EXIT_FAILURE, "malloc");
-\&
-    cd = iconv_open(nl_langinfo(CODESET), "UTF\-32");
-    if (cd == (iconv_t)\-1)
-        err(EXIT_FAILURE, "iconv_open");
-\&
-    in = (char *) utf32_pathname;
-    inbytes = sizeof(utf32_pathname);
-    out = locale_pathname;
-    outbytes = size;
-    if (iconv(cd, &in, &inbytes, &out, &outbytes) == (size_t) \-1)
-        err(EXIT_FAILURE, "iconv");
-\&
-    if (iconv_close(cd) == \-1)
-        err(EXIT_FAILURE, "iconv_close");
-\&
-    fp = fopen(locale_pathname, "w");
-    if (fp == NULL)
-        err(EXIT_FAILURE, "fopen");
-\&
-    fputs("Hello, world!\[rs]n", fp);
-    if (fclose(fp) == EOF)
-        err(EXIT_FAILURE, "fclose");
-\&
-    free(locale_pathname);
-    exit(EXIT_SUCCESS);
-}
-.EE
-.\" SRC END
 .SH SEE ALSO
 .BR limits.h (0p),
 .BR open (2),
 .BR fpathconf (3),
-.BR iconv (3),
-.BR nl_langinfo (3),
 .BR path_resolution (7),
 .BR mount (8)

Range-diff against v0:
-:  --------- > 1:  b9f5079f6 man/man7/pathname.7: Pathnames are opaque C strings
-- 
2.47.2

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux