At 2025-01-15T17:47:58+0100, Alejandro Colomar wrote: > On Wed, Jan 15, 2025 at 11:21:02AM -0500, Jason Yundt wrote: > > > Makes sense. How about a null-terminated string? > > > > The term null-terminated string still has some of the problems that > > I mentioned earlier. Specifically, people think of null-terminated > > strings as sequences of characters. It’s easier to understand how > > the kernel handles paths if you think of paths as sequences of > > bytes, not as sequences of characters. > > Hmmm, okay. Maybe I'm too biased as a C programmer, and this being a > generic page for users it makes sense to use other terms. There are many ways to represent strings. C is not the whole world. :) I think Jason has a good point. When considering byte sequences as simple small integers (some values of which are perhaps invalid), I think it's clearer to articulate them as such. Here, for instance, if I'm understanding Jason correctly, I might say "byte sequence terminated by a zero value". I think assembly programmers used to call that "ASCIZ". And they got up to all sorts of mischief in the eighth bit... > > That being said, I think that you misunderstood my two questions. > > You told me the current state of things. I’m not asking about the > > current state of things, I’m asking about a hypothetical future > > where programs started to “assume the Portable Filename Character > > Set (or at most some subset of ASCII), and fail hard outside of > > that”. If we start making that recommendation and programs start > > following that recommendation, then it sounds like I wouldn’t be > > able to do anything with a large part of my music collection, > > You could rename that music into something usable, and then use it. :) If you tell Japanese users they can't name a music file "いぬのおまわりさん.flac", they might run over you with a truck. ;-) (This reference may be intelligible only to members of Gen X.) > I would be happy in a world where all tools are restricted to the > portable filename character set. I once toyed with a patch for > enforcing such filenames in the kernel, just for fun. I've been pleased to start moving GNU troff in the _opposite_ direction. NEWS from the forthcoming 1.24.0 release: * GNU troff now strips a leading neutral double quote from the argument to the `cf`, `hpf`, `hpfa`, `mso`, `msoquiet`, `nx`, `pi`, `pso`, `so`, `soquiet`, `sy`, and `trf` requests, and the second argument to the `open` and `opena` requests, allowing it to contain embedded leading spaces. * GNU troff now accepts space characters in the argument to the `cf`, `hpf`, `hpfa`, `mso`, `msoquiet`, `nx`, `so`, `soquiet`, and `trf` requests, and the second argument to the `open` and `opena` requests. See "soelim" below. * soelim no longer requires embedded space characters in `so` arguments to be backslash-escaped. (It continues to support that syntax, even though neither AT&T nor GNU troff ever has.) If the argument to a `so` request must contain leading spaces, any such sequence of spaces must now be prefixed with a double quote character ("), which the program then discards. These changes are to better align this program's parsing rules with the language of the formatter; consider the `ds` and `as` requests. In 1.25 I want to support the use of groff-style Unicode special character escape sequences to encode byte sequences in file names. Notice that I do say _bytes_, so the range will be limited: \[u0000] to \[u00FF]. But that will be enough to encode UTF-8, or sickness like UTF-16LE. https://savannah.gnu.org/bugs/index.php?65108 > On the other hand, I see the usefulness for others in programs trying > to work with other stuff. So the manual page makes sense, and I'll > swallow my disagreement. :-) [digression into software development philosophy follows] You're joining the side of the angels. Authors of literature (fiction, academic, legal, technical, etc.) tend to be unimpressed by some of the limitations on representation that systems programmers find obvious and sensible. More generally, the whole reason the operating system exists is to facilitate the efficient execution of _applications_ (or "jobs", as their card decks were known in the days when a "monitor program" to occupy a machine's idle cycles was a novel concept). Systems programming (be it in the kernel per se or at the layer of general services in user space) can definitely be a great place to spend one's career, but we do best when we remember that it's not an end in itself...lest we come to resemble those JavaScript fanatics who seem to spend all their time fighting wars with each other over "frameworks". Thus, if a groff user wants to name their document on-disk "Обладала фактической самостоятельностью.ms", I feel pretty lame if I tell them they can't. When dealing with users, a principle I try to follow is to actively look for ways to say "yes" to their requests. Reasons for saying "no" generally don't need to be sought out--they present themselves with depressing frequency. Sometimes the user wants an impossible or infeasible thing; beyond the obvious limitations of finite storage, CPU cycles, and I/O bandwidth. Some problems have high lower bounds on complexity. Occasionally someone asks for something that blunders directly into an unsolved problem in computer science. I omit from the foregoing consideration the phenomenon of users who know something--but often not enough--about implementation details, and therefore have a tendency to design "solutions" in their heads and request those instead of presenting their problem scenario. This type of user shows up everywhere, but the bug-bash mailing list is especially rich with them. With these people, before you can get to "yes" you have to ask, "what is it you're trying to do?". Sometimes they just won't tell you. To some of these people, the only way to stay sane is to start with "no". "The three most dangerous things in the world are a programmer with a soldering iron, a hardware type with a program patch, and a user with an idea." -- Rick Cook Regards, Branden
Attachment:
signature.asc
Description: PGP signature