Re: [PATCH v2] man/man7/path-format.7: Add file documenting format of pathnames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 2025-01-15T17:47:58+0100, Alejandro Colomar wrote:
> On Wed, Jan 15, 2025 at 11:21:02AM -0500, Jason Yundt wrote:
> > > Makes sense.  How about a null-terminated string?
> > 
> > The term null-terminated string still has some of the problems that
> > I mentioned earlier.  Specifically, people think of null-terminated
> > strings as sequences of characters.  It’s easier to understand how
> > the kernel handles paths if you think of paths as sequences of
> > bytes, not as sequences of characters.
> 
> Hmmm, okay.  Maybe I'm too biased as a C programmer, and this being a
> generic page for users it makes sense to use other terms.

There are many ways to represent strings.  C is not the whole world.  :)

I think Jason has a good point.  When considering byte sequences as
simple small integers (some values of which are perhaps invalid), I
think it's clearer to articulate them as such.

Here, for instance, if I'm understanding Jason correctly, I might say
"byte sequence terminated by a zero value".

I think assembly programmers used to call that "ASCIZ".  And they got up
to all sorts of mischief in the eighth bit...

> > That being said, I think that you misunderstood my two questions.
> > You told me the current state of things.  I’m not asking about the
> > current state of things, I’m asking about a hypothetical future
> > where programs started to “assume the Portable Filename Character
> > Set (or at most some subset of ASCII), and fail hard outside of
> > that”.  If we start making that recommendation and programs start
> > following that recommendation, then it sounds like I wouldn’t be
> > able to do anything with a large part of my music collection,
> 
> You could rename that music into something usable, and then use it.  :)

If you tell Japanese users they can't name a music file
"いぬのおまわりさん.flac", they might run over you with a truck.  ;-)

(This reference may be intelligible only to members of Gen X.)

> I would be happy in a world where all tools are restricted to the
> portable filename character set.  I once toyed with a patch for
> enforcing such filenames in the kernel, just for fun.

I've been pleased to start moving GNU troff in the _opposite_ direction.

NEWS from the forthcoming 1.24.0 release:

*  GNU troff now strips a leading neutral double quote from the argument
   to the `cf`, `hpf`, `hpfa`, `mso`, `msoquiet`, `nx`, `pi`, `pso`,
   `so`, `soquiet`, `sy`, and `trf` requests, and the second argument to
   the `open` and `opena` requests, allowing it to contain embedded
   leading spaces.

*  GNU troff now accepts space characters in the argument to the `cf`,
   `hpf`, `hpfa`, `mso`, `msoquiet`, `nx`, `so`, `soquiet`, and `trf`
   requests, and the second argument to the `open` and `opena` requests.
   See "soelim" below.

*  soelim no longer requires embedded space characters in `so` arguments
   to be backslash-escaped.  (It continues to support that syntax, even
   though neither AT&T nor GNU troff ever has.)  If the argument to a
   `so` request must contain leading spaces, any such sequence of spaces
   must now be prefixed with a double quote character ("), which the
   program then discards.  These changes are to better align this
   program's parsing rules with the language of the formatter; consider
   the `ds` and `as` requests.

In 1.25 I want to support the use of groff-style Unicode special
character escape sequences to encode byte sequences in file names.
Notice that I do say _bytes_, so the range will be limited: \[u0000] to
\[u00FF].  But that will be enough to encode UTF-8, or sickness like
UTF-16LE.

https://savannah.gnu.org/bugs/index.php?65108

> On the other hand, I see the usefulness for others in programs trying
> to work with other stuff.  So the manual page makes sense, and I'll
> swallow my disagreement.  :-)

[digression into software development philosophy follows]

You're joining the side of the angels.

Authors of literature (fiction, academic, legal, technical, etc.) tend
to be unimpressed by some of the limitations on representation that
systems programmers find obvious and sensible.

More generally, the whole reason the operating system exists is to
facilitate the efficient execution of _applications_ (or "jobs", as
their card decks were known in the days when a "monitor program" to
occupy a machine's idle cycles was a novel concept).

Systems programming (be it in the kernel per se or at the layer of
general services in user space) can definitely be a great place to spend
one's career, but we do best when we remember that it's not an end in
itself...lest we come to resemble those JavaScript fanatics who seem to
spend all their time fighting wars with each other over "frameworks".

Thus, if a groff user wants to name their document on-disk "Обладала
фактической самостоятельностью.ms", I feel pretty lame if I tell them
they can't.  When dealing with users, a principle I try to follow is to
actively look for ways to say "yes" to their requests.  Reasons for
saying "no" generally don't need to be sought out--they present
themselves with depressing frequency.  Sometimes the user wants an
impossible or infeasible thing; beyond the obvious limitations of finite
storage, CPU cycles, and I/O bandwidth.  Some problems have high lower
bounds on complexity.  Occasionally someone asks for something that
blunders directly into an unsolved problem in computer science.

I omit from the foregoing consideration the phenomenon of users who know
something--but often not enough--about implementation details, and
therefore have a tendency to design "solutions" in their heads and
request those instead of presenting their problem scenario.  This type
of user shows up everywhere, but the bug-bash mailing list is especially
rich with them.  With these people, before you can get to "yes" you have
to ask, "what is it you're trying to do?".  Sometimes they just won't
tell you.  To some of these people, the only way to stay sane is to
start with "no".

"The three most dangerous things in the world are a programmer with a
soldering iron, a hardware type with a program patch, and a user with an
idea." -- Rick Cook

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux