On 7 Jan 2022, at 01:43, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > > On 2022-01-07 at 00:39:59, Jessica Clarke wrote: >> On 7 Jan 2022, at 00:31, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: >>> If you want to get really language-lawyer-y about it, you can actually >>> argue that this is a compliant implementation of the C standard. >>> Integer types are permitted to have padding bits, and some combinations >>> of padding bits are allowed to be trap representations. Technically, in >>> our representation, the metadata bits are padding bits, because they do >>> not contribute to the precision like value bits. It is therefore the >>> case that the *value* of a uintptr_t still fits into a uintmax_t, but >>> the latter has no padding bits, and casting the latter to the former >>> yields a trap representation when further cast back to a pointer. This >>> may not the intent of the spec, and not how anyone thinks of it because >>> CHERI is the first implementation that pushes the boundary here, but >>> it’s technically legal under that interpretation. You may disagree with >>> the interpretation, and I don’t like to use it most of the time because >>> it’s complicated and involves yet more ill-defined parts of the spec >>> (e.g. it says arithmetic operations on valid values (they mean objects, >>> I assume, as the value only includes value bits, but the input could be >>> a trap representation on some implementations) never generate a trap >>> representation other than as part of an exceptional condition such as >>> an overflow, but nowhere defines what counts as an arithmetic >>> operation). >> >> >> So, no, C does not actually require what you say. It requires that void >> * -> uintptr_t -> void * give you a valid pointer. It requires that >> uintptr_t -> uintmax_t preserves the *value* of the uintptr_t, which we >> do, because the value is formed from only the value bits which >> contribute to the precision, which is 64 bits in this case, and >> uintmax_t is still 64-bit. It requires that uintmax_t -> uintptr_t, >> since uintptr_t’s precision is the same as uintmax_t’s, be always >> valid, which is is. But it does not require that that uintptr_t has the >> same representation as the original uintptr_t, which it does not for >> us. And therefore it does not require that casting that uintptr_t back >> to a void * yields a valid pointer. So if you want to really dig into >> the details of the standard, we are technically compliant, even if some >> might argue it’s not in the spirit of the standard. > > Sure, implementations are allowed to have padding bits. They're also > allowed, at the moment, to use signed-magnitude or ones' complement > integers, have CHAR_BIT greater than 8, have sizeof(char) == > sizeof(short), not implement any of the customary sizes of intN_t or > uintN_t, not provide uintptr_t, and use middle-endian numbers. > > However, if your ABI is only compliant in the face of those features > (especially when it could have been written in a way which would have > been compliant without the use of those features), it's intentionally > hostile to real-world developers, and I don't think we should support > it[0]. I'd be willing to revisit this if your ABI were defined in a > reasonable, sane way, where sizeof(uintmax_t) >= sizeof(uintptr_t), > without padding bits, where the alignment of pointers from malloc is > suitable for all types, and where the alignment of a type is no greater > than sizeof(type). > > I'm not opposed to a small amount of finagling for this case, but I am > very much opposed to defining your C ABI in an intentionally difficult > way. 128-bit integers in 64-bit Linux were not originally part of the C > ABIs and if the ABI is ill defined now, that's a historical accident. > But this is a new ABI for a new architecture and it could have been > defined in a responsible way, but wasn't. It’s not a choice. It is not possible to define uintmax_t as being able to safely round-trip any uintptr_t in a way that preserves pointer validity unless you want to outlaw 64-bit integers on 32-bit CHERI implementations, and good luck with that one, for reasons I have previously explained, so if you do not wish to support this C implementation then upstream Git will not work on CHERI/Arm’s Morello. Nothing in this patch causes additional burden on other architectures, in fact it likely improves performance on some due to increased buffer alignment, so this seems a rather unnecessary line to take. We have all of our fork of FreeBSD, kernel and userland, built and working on CHERI, as well as X11 and the core of a KDE desktop (KWin, Plasma, Kate, Okular, Gwenview, Konsole; we would have more but up until recently we’ve needed to cross-compile everything and that’s often a pain to get things building), plus WebKit’s JavaScriptCore complete with a working JIT, so Git would be a notable omission in the stack, and rather important for developers to actually get work done on their boards. Note that C requires us to define those bits as padding bits; the range of an integer type is defined to be 0 to 2^N-1, where N is the number of value bits, so it would be impossible to have any metadata in a uintptr_t. So I take issue with “in a responsible way”. CHERI has been in development for over 10 years, it’s been done in collaboration with Arm, with people intimately familiar with C semantics and with extensive analysis of what real-world software expects of C in order to come up with the most-compatible implementation we reasonably can that’s still feasible to implement in hardware and provides the desired protection properties. Declaring it broken/hostile/etc feels like throwing the baby out with the bath water; for industry to move away from the sorry state of affairs that we still see with C and C++ where memory safety vulnerabilities are ubiquitous you cannot just restrict new ABIs to the status quo, as you just won’t make any progress. > As an aside, I was actually going to point out that you could propose a > nice Rust or Go ABI with the status quo, but if your C ABI requires > padding bits, then you're probably going to have a hard time doing so, > since I don't believe those languages support padding bits and they need > to support the C ABI. There is no such restriction in Rust that I can see. It’s true you’d need to slightly tweak Go’s specification as uintptr is currently an integer type and n-bit integer types are defined to be n-bit two’s complement values. I see no reason why that couldn’t be done though, I doubt much cares. Rust only defines the representation of its fixed-width integer types. There’s no reason you couldn’t add an additional integer type for a capability, like we do for C. The only issue with Rust is that it conflates uintptr_t with size_t, for which there are various discussion threads with Rust maintainers and a path forward for making it work if someone has the time to invest in making Rust work. Languages can evolve; there’s no reason they can’t for CHERI like they do for anything else they want to add, change, deprecate or remove. But this is all irrelevant to discussions of C. Jess > [0] For the record, I care strongly about portability, and I would not > accept a runtime having any of the qualities I mentioned in the first > paragraph. > -- > brian m. carlson (he/him or they/them) > Toronto, Ontario, CA