On Fri, 22 Jun 2018 05:54:57 -0000, Michael Witten wrote: > On Thu, 21 Jun 2018 21:46:03 +0200, Michael Kerrisk wrote: >> On 06/20/2018 06:25 PM, Michael Witten wrote: >>> This man page defines what "byte" means in the context of Linux >>> programming; it draws on various authoritative references, namely >>> Linus Torvalds's master's thesis, POSIX, and the C11 [draft] >>> standard. Each of these references is properly cited. >>> >>> The content has been laid out to render well in a pager that >>> provides at least 80 columns of monospace characters; it is best >>> viewed by `man' with at least one of the following environment >>> variable definitions: >>> >>> COLUMNS=80 >>> MANWIDTH=80 >> >> Thanks for sending this, but what's missing in this cover message >> is some explanation of why the page needed. It's not clear to me. >> Nor is the rationale clear from reading the start of the page. So, >> why is the page needed? > > A programmer needs to hook into various interfaces to make things > work. Linux provides an interface, POSIX provides an interface, and > the C standard provides an interface; and, of course, there are many > other interfaces, some of which haven't even yet been built, but for > which a programmer might want to be fully prepared, and which might > itself target one of those Big interfaces while neglecting another. > > Though these Big Three interfaces are related, they're not actually > coupled all that strongly together; there's plenty of room for > disagreement both now and in the future, which is one reason why > Linus Torvalds writes in his master's thesis about the size of a > byte and the nature of data (the quote in the new man page is from > the section "Unresolved Issues", where he details concerns about > portability). > > [...] > > There's nothing stopping a determined soul from porting Linux to an > unusual architecture that does not have an 8-bit primitive; for the > sake of compatibility, that port would undoubtedly require a few > hacks to emulate an 8-bit interface, but that's just the kernel! The > user space is an entirely different domain, which might eschew POSIX > compliance (targeting instead just the looser constraints of the C > standard), and thereby place on the programmer the burden of > structuring data properly. > > Even if there were the *strictest* compliance to POSIX, guess what? > An `unsigned short' under POSIX ain't necessarily 16 bits; like the > C standard, POSIX requires only that an `unsigned short' be capable > of representing *at least* 16 bits: > > http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/limits.h.html#tag_13_23_03_06 > {USHRT_MAX} > Maximum value for an object of type unsigned short. > Minimum Acceptable Value: 65 535 > > All of your code that uses an uninitialized `unsigned short' to read > in a single 2-byte datum is wrongheaded, even under POSIX; it just > "happens to work", at least for now. You've got to clear those > "extra" higher-order bits if you don't want them used inadvertently > in your calculation. That is, you've got to write a program that is > aware of the sizes of even basic data types. > > As described by Linus's master's thesis, that's why the Linux kernel > targets a header-based "virtual machine" that provides architecture- > specific implementations of integer types with precise widths (e.g., > `u8', `u16', or `u32); similarly, that's why this man page mentions > C99's fixed-width integer types (e.g., `uint8_t', `uint16_t', etc.). > > [...] > > The new man page explicitly discusses issues like this, but con- > centrates more on the narrow topic of what a "byte" or a "char" is. > Perhaps the purpose of this man page would be more obvious if other > data types (like `short') were also listed in the SYNOPSIS, and > further discussed in the DESCRIPTION. Perhaps the man page should > also delve into Linux's integer types. > > What do you think? I've expanded the SYNOPSIS to include information that is perhaps more widely useful, though I haven't yet expanded the DESCRIPTION to follow suit; here is its ASCII rendering (without bolding or italicizing, and without a couple fancy Unicode characters for the arrow symbols): BYTE(7) Linux Programmer's Manual BYTE(7) NAME byte - exactly 8 bits; the smallest addressable unit in the kernel char - at least 8 bits; the smallest addressable unit in C SYNOPSIS Linux and POSIX (and modern computing) byte <-> exactly 8 bits This is the definition of "byte" used throughout this manual. POSIX char <-> exactly 1 byte short <-> at least 2 bytes int <-> at least 4 bytes long <-> at least 4 bytes long long <-> at least 8 bytes Standard C char <-> at least 1 byte <- Beware! short <-> at least 2 bytes int <-> at least 2 bytes <- Beware! long <-> at least 4 bytes long long <-> at least 8 bytes IP16L32 (16-bit x86; "near" pointers) char <-> exactly 1 byte short <-> exactly 2 bytes int <-> exactly 2 bytes long <-> exactly 4 bytes a pointer <-> exactly 2 bytes I16LP32 (16-bit x86; "far" pointers) char <-> exactly 1 byte short <-> exactly 2 bytes int <-> exactly 2 bytes long <-> exactly 4 bytes a pointer <-> exactly 4 bytes ILP32 (32-bit x86) char <-> exactly 1 byte short <-> exactly 2 bytes int <-> exactly 4 bytes long <-> exactly 4 bytes long long <-> exactly 8 bytes a pointer <-> exactly 4 bytes IL32P64 or LLP64 (x86-64) char <-> exactly 1 byte short <-> exactly 2 bytes int <-> exactly 4 bytes long <-> exactly 4 bytes long long <-> exactly 8 bytes a pointer <-> exactly 8 bytes I32LP64 or LP64 (x86-64) char <-> exactly 1 byte short <-> exactly 2 bytes int <-> exactly 4 bytes long <-> exactly 8 bytes long long <-> exactly 8 bytes a pointer <-> exactly 8 bytes ILP64 (SPARC64) char <-> exactly 1 byte short <-> exactly 2 bytes int <-> exactly 8 bytes long <-> exactly 8 bytes long long <-> exactly 8 bytes a pointer <-> exactly 8 bytes SILP64 (Cray) char <-> exactly 1 byte short <-> exactly 8 bytes int <-> exactly 8 bytes long <-> exactly 8 bytes long long <-> exactly 8 bytes a pointer <-> exactly 8 bytes ... DESCRIPTION [...] Sincerely, Michael Witten -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html