On 7 Jan 2022, at 02:08, Jessica Clarke <jrtc27@xxxxxxxxxx> wrote: > > On 7 Jan 2022, at 01:43, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> On 2022-01-07 at 00:39:59, Jessica Clarke wrote: >>> On 7 Jan 2022, at 00:31, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: >>>> If you want to get really language-lawyer-y about it, you can actually >>>> argue that this is a compliant implementation of the C standard. >>>> Integer types are permitted to have padding bits, and some combinations >>>> of padding bits are allowed to be trap representations. Technically, in >>>> our representation, the metadata bits are padding bits, because they do >>>> not contribute to the precision like value bits. It is therefore the >>>> case that the *value* of a uintptr_t still fits into a uintmax_t, but >>>> the latter has no padding bits, and casting the latter to the former >>>> yields a trap representation when further cast back to a pointer. This >>>> may not the intent of the spec, and not how anyone thinks of it because >>>> CHERI is the first implementation that pushes the boundary here, but >>>> it’s technically legal under that interpretation. You may disagree with >>>> the interpretation, and I don’t like to use it most of the time because >>>> it’s complicated and involves yet more ill-defined parts of the spec >>>> (e.g. it says arithmetic operations on valid values (they mean objects, >>>> I assume, as the value only includes value bits, but the input could be >>>> a trap representation on some implementations) never generate a trap >>>> representation other than as part of an exceptional condition such as >>>> an overflow, but nowhere defines what counts as an arithmetic >>>> operation). >>> >>> >>> So, no, C does not actually require what you say. It requires that void >>> * -> uintptr_t -> void * give you a valid pointer. It requires that >>> uintptr_t -> uintmax_t preserves the *value* of the uintptr_t, which we >>> do, because the value is formed from only the value bits which >>> contribute to the precision, which is 64 bits in this case, and >>> uintmax_t is still 64-bit. It requires that uintmax_t -> uintptr_t, >>> since uintptr_t’s precision is the same as uintmax_t’s, be always >>> valid, which is is. But it does not require that that uintptr_t has the >>> same representation as the original uintptr_t, which it does not for >>> us. And therefore it does not require that casting that uintptr_t back >>> to a void * yields a valid pointer. So if you want to really dig into >>> the details of the standard, we are technically compliant, even if some >>> might argue it’s not in the spirit of the standard. >> >> Sure, implementations are allowed to have padding bits. They're also >> allowed, at the moment, to use signed-magnitude or ones' complement >> integers, have CHAR_BIT greater than 8, have sizeof(char) == >> sizeof(short), not implement any of the customary sizes of intN_t or >> uintN_t, not provide uintptr_t, and use middle-endian numbers. >> >> However, if your ABI is only compliant in the face of those features >> (especially when it could have been written in a way which would have >> been compliant without the use of those features), it's intentionally >> hostile to real-world developers, and I don't think we should support >> it[0]. I'd be willing to revisit this if your ABI were defined in a >> reasonable, sane way, where sizeof(uintmax_t) >= sizeof(uintptr_t), >> without padding bits, where the alignment of pointers from malloc is >> suitable for all types, and where the alignment of a type is no greater >> than sizeof(type). >> >> I'm not opposed to a small amount of finagling for this case, but I am >> very much opposed to defining your C ABI in an intentionally difficult >> way. 128-bit integers in 64-bit Linux were not originally part of the C >> ABIs and if the ABI is ill defined now, that's a historical accident. >> But this is a new ABI for a new architecture and it could have been >> defined in a responsible way, but wasn't. > > It’s not a choice. It is not possible to define uintmax_t as being able > to safely round-trip any uintptr_t in a way that preserves pointer > validity unless you want to outlaw 64-bit integers on 32-bit CHERI > implementations, and good luck with that one, for reasons I have > previously explained, so if you do not wish to support this C > implementation then upstream Git will not work on CHERI/Arm’s Morello. > Nothing in this patch causes additional burden on other architectures, > in fact it likely improves performance on some due to increased buffer > alignment, so this seems a rather unnecessary line to take. We have all > of our fork of FreeBSD, kernel and userland, built and working on > CHERI, as well as X11 and the core of a KDE desktop (KWin, Plasma, > Kate, Okular, Gwenview, Konsole; we would have more but up until > recently we’ve needed to cross-compile everything and that’s often a > pain to get things building), plus WebKit’s JavaScriptCore complete > with a working JIT, so Git would be a notable omission in the stack, > and rather important for developers to actually get work done on their > boards. > > Note that C requires us to define those bits as padding bits; the range > of an integer type is defined to be 0 to 2^N-1, where N is the number > of value bits, so it would be impossible to have any metadata in a > uintptr_t. > > So I take issue with “in a responsible way”. CHERI has been in > development for over 10 years, it’s been done in collaboration with > Arm, with people intimately familiar with C semantics and with > extensive analysis of what real-world software expects of C in order to > come up with the most-compatible implementation we reasonably can > that’s still feasible to implement in hardware and provides the desired > protection properties. Declaring it broken/hostile/etc feels like > throwing the baby out with the bath water; for industry to move away > from the sorry state of affairs that we still see with C and C++ where > memory safety vulnerabilities are ubiquitous you cannot just restrict > new ABIs to the status quo, as you just won’t make any progress. > >> As an aside, I was actually going to point out that you could propose a >> nice Rust or Go ABI with the status quo, but if your C ABI requires >> padding bits, then you're probably going to have a hard time doing so, >> since I don't believe those languages support padding bits and they need >> to support the C ABI. > > There is no such restriction in Rust that I can see. It’s true you’d > need to slightly tweak Go’s specification as uintptr is currently an > integer type and n-bit integer types are defined to be n-bit two’s > complement values. I see no reason why that couldn’t be done though, I > doubt much cares. Rust only defines the representation of its > fixed-width integer types. There’s no reason you couldn’t add an > additional integer type for a capability, like we do for C. The only > issue with Rust is that it conflates uintptr_t with size_t, for which > there are various discussion threads with Rust maintainers and a path > forward for making it work if someone has the time to invest in making > Rust work. Languages can evolve; there’s no reason they can’t for CHERI > like they do for anything else they want to add, change, deprecate or > remove. > > But this is all irrelevant to discussions of C. Oh and: > where the alignment of pointers from malloc is > suitable for all types This is true of CHERI. > and where the alignment of a type is no greater > than sizeof(type). This is true of CHERI, and it’s impossible for that not to be true. Otherwise arrays don’t work. Jess