On Thu, 21 Jun 2018 21:46:03 +0200, Michael Kerrisk wrote: >On 06/20/2018 06:25 PM, Michael Witten wrote: >> This man page defines what "byte" means in the context of Linux >> programming; it draws on various authoritative references, namely >> Linus Torvalds's master's thesis, POSIX, and the C11 [draft] >> standard. Each of these references is properly cited. >> >> The content has been laid out to render well in a pager that >> provides at least 80 columns of monospace characters; it is best >> viewed by `man' with at least one of the following environment >> variable definitions: >> >> COLUMNS=80 >> MANWIDTH=80 > > Thanks for sending this, but what's missing in this cover message > is some explanation of why the page needed. It's not clear to me. > Nor is the rationale clear from reading the start of the page. So, > why is the page needed? A programmer needs to hook into various interfaces to make things work. Linux provides an interface, POSIX provides an interface, and the C standard provides an interface; and, of course, there are many other interfaces, some of which haven't even yet been built, but for which a programmer might want to be fully prepared, and which might itself target one of those Big interfaces while neglecting another. Though these Big Three interfaces are related, they're not actually coupled all that strongly together; there's plenty of room for disagreement both now and in the future, which is one reason why Linus Torvalds writes in his master's thesis about the size of a byte and the nature of data (the quote in the new man page is from the section "Unresolved Issues", where he details concerns about portability). For one thing, standards are written to be ignored. When has anything of moderate complexity in this world, let alone in computing, ever really done what it's supposed to do? That's why the digital world has been (was?) built by hackers; it took clever folk who weren't afraid to connect things together, but their intrepid spirit came not from saying "Hey, it compiled without error when I pressed the 'play' button!", but rather from knowing exactly how to connect things, especially on a low level. Indeed, you don't have much of a programming environment without a way to think about bytes or the sizes of data types. I suspect that most programs only "work" in the sense that they "happen to work"; these days, 32-bit computers are being routinely dropped by software projects, and labeled "obsolete", despite the fact that they are perfectly adequate machines and had been supported without trouble for years if not decades. Why? Because that software sucks, and was written without a shred of respect for the sizes (or layouts) of data types. On the LKML, you can find people commiserating over the horrors of bit-fields, for the simple reason that they do not behave like they should according to "Common Sense" (nevertheless, they satisfy every one of the C standard's specifications... or lack thereof). Similarly, how many problems have been caused over the years by a lack of respect for endianness? Such failures to meet the demands of an interface are a cultural phenomenon that manifests from the dearth of documentation on these topics, particularly now that computing has reached ever more lofty heights of abstraction, shielding many a budding programmer from the trial-by-fire that is coding near the metal. Sure, Linux targets POSIX (or maybe POSIX now targets Linux), but only so far. Only so far! The nature of the Linux kernel is such that it is at best "POSIX-like", rather than "POSIX-compliant"; it's driven more by backwards compatibility than adherence to the digital dictates of a committee. There's nothing stopping a determined soul from porting Linux to an unusual architecture that does not have an 8-bit primitive; for the sake of compatibility, that port would undoubtedly require a few hacks to emulate an 8-bit interface, but that's just the kernel! The user space is an entirely different domain, which might eschew POSIX compliance (targeting instead just the looser constraints of the C standard), and thereby place on the programmer the burden of structuring data properly. Even if there were the *strictest* compliance to POSIX, guess what? An `unsigned short' under POSIX ain't necessarily 16 bits; like the C standard, POSIX requires only that an `unsigned short' be capable of representing *at least* 16 bits: http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/limits.h.html#tag_13_23_03_06 {USHRT_MAX} Maximum value for an object of type unsigned short. Minimum Acceptable Value: 65 535 All of your code that uses an uninitialized `unsigned short' to read in a single 2-byte datum is wrongheaded, even under POSIX; it just "happens to work", at least for now. You've got to clear those "extra" higher-order bits if you don't want them used inadvertently in your calculation. That is, you've got to write a program that is aware of the sizes of even basic data types. As described by Linus's master's thesis, that's why the Linux kernel targets a header-based "virtual machine" that provides architecture- specific implementations of integer types with precise widths (e.g., `u8', `u16', or `u32); similarly, that's why this man page mentions C99's fixed-width integer types (e.g., `uint8_t', `uint16_t', etc.). Please recall that the compiler used to build the kernel need not be the compiler used to build normal programs; while the aforementioned port may require a hacked implementation of C to emulate `u8' in the kernel source, there's no reason to suspect that such a hack is also available for the user-space compiler. The new man page explicitly discusses issues like this, but con- centrates more on the narrow topic of what a "byte" or a "char" is. Perhaps the purpose of this man page would be more obvious if other data types (like `short') were also listed in the SYNOPSIS, and further discussed in the DESCRIPTION. Perhaps the man page should also delve into Linux's integer types. What do you think? The world is messy, and a progammer (more than anybody else) needs to be aware of just how messy it is; a good programmer *wants* to know how messy the world is, and a superb programmer enjoys thinking about how to keep things tidy. In short, I was scratching an itch. I wrote a man page that I personally wish had already been written when I was beginning to think about such things; certainly, even writing this man page has helped me crystalize and organize my thoughts on the topic of compatibility, portability, and dependable exchange of data across interfaces. I hold in mind a picture of a programmer, sitting eagerly before the terminal of a software development environment, ready and willing to read about the tools at one's disposal. What's a Linux Programmer's Manual without a page on bits and bytes? Sincerely, Michael Witten -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html