Re: Key endianness?

Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> · Mon, 21 Oct 2019 21:47:19 +0200

On Mon, 21 Oct 2019 at 21:14, Pascal Van Leeuwen
<pvanleeuwen@xxxxxxxxxxxxxx> wrote:
>
> > -----Original Message-----
> > From: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> > Sent: Monday, October 21, 2019 6:15 PM
> > To: Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxx>
> > Cc: linux-crypto@xxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: Key endianness?
> >
> > On Mon, 21 Oct 2019 at 17:55, Pascal Van Leeuwen
> > <pvanleeuwen@xxxxxxxxxxxxxx> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> > > > Sent: Monday, October 21, 2019 5:32 PM
> > > > To: Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxx>
> > > > Cc: linux-crypto@xxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx
> > > > Subject: Re: Key endianness?
> > > >
> > > > p[
> > > >
> > > > On Mon, 21 Oct 2019 at 17:23, Pascal Van Leeuwen
> > > > <pvanleeuwen@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> > > > > > Sent: Monday, October 21, 2019 2:54 PM
> > > > > > To: Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxx>
> > > > > > Cc: linux-crypto@xxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx
> > > > > > Subject: Re: Key endianness?
> > > > > >
> > > > > > On Mon, 21 Oct 2019 at 14:40, Pascal Van Leeuwen
> > > > > > <pvanleeuwen@xxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> > > > > > > > Sent: Monday, October 21, 2019 1:59 PM
> > > > > > > > To: Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxx>
> > > > > > > > Cc: linux-crypto@xxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx
> > > > > > > > Subject: Re: Key endianness?
> > > > > > > >
> > > > > > > > On Mon, 21 Oct 2019 at 12:56, Pascal Van Leeuwen
> > > > > > > > <pvanleeuwen@xxxxxxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > Another endianness question:
> > > > > > > > >
> > > > > > > > > I have some data structure that can be either little or big endian,
> > > > > > > > > depending on the exact use case. Currently, I have it defined as u32.
> > > > > > > > > This causes sparse errors when accessing it using cpu_to_Xe32() and
> > > > > > > > > Xe32_to_cpu().
> > > > > > > > >
> > > > > > > > > Now, for the big endian case, I could use htonl()/ntohl() instead,
> > > > > > > > > but this is inconsistent with all other endian conversions in the
> > > > > > > > > driver ... and there's no little endian alternative I'm aware of.
> > > > > > > > > So I don't really like that approach.
> > > > > > > > >
> > > > > > > > > Alternatively, I could define a union of both a big and little
> > > > > > > > > endian version of the data but that would require touching a lot
> > > > > > > > > of legacy code (unless I use a C11 anonymous union ... not sure
> > > > > > > > > if that would be allowed?) and IMHO is a bit silly.
> > > > > > > > >
> > > > > > > > > Is there some way of telling sparse to _not_ check for "correct"
> > > > > > > > > use of these functions for a certain variable?
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > In this case, just use (__force __Xe32*) to cast it to the correct
> > > > > > > > type. This annotates the cast as being intentionally endian-unclean,
> > > > > > > > and shuts up Sparse.
> > > > > > > >
> > > > > > > Thanks for trying to help out, but that just gives me an
> > > > > > > "error: not an lvalue" from both sparse and GCC.
> > > > > > > But I'm probably doing it wrong somehow ...
> > > > > > >
> > > > > >
> > > > > > It depends on what you are casting. But doing something like
> > > > > >
> > > > > > u32 l = ...
> > > > > > __le32 ll = (__force __le32)l
> > > > > >
> > > > > > should not trigger a sparse warning.
> > > > > >
> > > > > I was actually casting the left side, not the right side,
> > > > > as that's where my sparse issue was. Must be my poor grasp
> > > > > of the C language hurting me here as I don't understand why
> > > > > I'm not allowed to cast an array element to a different type
> > > > > of the _same size_ ...
> > > > >
> > > > > i.e. why can't I do (__be32)some_u32_array[3] = cpu_to_be32(some_value)?
> > > > >
> > > >
> > > > Because you can only change the type of an expression by casting, and
> > > > an lvalue is not an expression. A variable has a type already, and you
> > > > cannot cast that away - what would that mean, exactly? Would all
> > > > occurrences of some_u32_array[] suddenly have a different type? Or
> > > > only element [3]?
> > > >
> > > I think it would be perfectly logical to do such a cast and I'm really
> > > surprised that it is not legal. Obviously, it would only apply to the
> > > actual assignment it is used with. It's a cast, not a redefinition.
> > > After all, a variable or an array item is just some storage area in
> > > memory. Why shouldn't I be able to write to it _as if_ it is some
> > > different type (if I know what I'm doing and especially if it is the
> > > exact same size in memory)?
> > >
> > > >
> > > > > I managed to work around it by doing *(__be32 *)&some_u32_array[3] =
> > > > > but that's pretty ugly ... a better approach is still welcome.
> > > > >
> > > >
> > > > You need to cast the right hand side, not the left hand side. If
> > > > some_u32_array is u32[], force cast it to (__force u32)
> > > >
> > >
> > > Sure, you can do the casting on the right hand side, but that may not
> > > convey what you _really_ want to do, particularly in this case.
> > > As I _really_ want to write a big endian word there. I don't want to
> > > pretend I loose the endianness somewhere along the way. That written
> > > word is still very much big endian.
> > > (I know practically it makes no difference, but casting the left side
> > > would just be so much clearer IMHO)
> > >
> >
> >
> > No, it really isn't, and I am tired of having another endless debate about this.
> >
> I was not intending to start a debate - I was just being a newby C programmer
> being _honestly surprised_ by this limitation. You (or I, anyway) would just expect
> it to work. The rest is personal taste, which is is not debatable (it just is).
>

But it is not a limitation. You are arguing that it is natural and
obvious to cast the expression, but it is not. Really.

(be32)lvalue = <expression of type be32>

would actually mean that the operation applied to the expression is
the opposite, given the lvalue is *not* a be32 to begin with, and so
you expect that lvalue is made to be a be32 for the purpose of the
assignment, after which it is converted back to what it was before? In
this particular case, this comes down to doing the inverse cast on the
expression. But what happens is there is no inverse cast? What would
it mean, for instance, to do

(u16)lvalue = <expression of type s16>

when lvalue is of type s16 itself? Does the sign bit become a data bit
now? Do we have to keep track of this throughout the code?

The bottom line is that the cast operation is only defined for values,
not for variables. A variable's type is fixed - you cannot change it
for the duration of a cast operation and expect the compiler to infer
what this might mean in the general case, even if you think your
endianness conversion example is an obvious case.

> > C permits casting of expressions, not of lvalues.
> > ...
> > > > > >
> > > > > > > > If the
> > > > > > > > hardware chooses to reorder those bytes for some reason, it is the
> > > > > > > > responsibility of the driver to take care of that from the CPU side.
> > > > > > > >
> > > > > > > Which still requires you to know the byte order as used by the API.
> > > > > > >
> > > > > >
> > > > > > Only if API means the AES or ChaCha specific helper routines that we
> > > > > > have in the kernel. If you are using the AES helpers, then yes, you
> > > > > > need to ensure that you use the same convention. But the algorithms
> > > > > > themselves are fully defined by their specification, and so what other
> > > > > > implementations in the kernel do is not really relevant.
> > > > > >
> > > > > What is relevant is what the API expects
> > > >
> > > > But *which* API? The skcipher API uses u8[] for in/output and keys,
> > > > and how these byte arrays are interpreted is not (and cannot) be
> > > > defined at this level of abstraction.
> > > >
> > > Yes, skcipher API. Obviously. As that's what we're talking about.
> > > And _of course_ it has to be defined at that level of abstraction.
> > > (which doesn't preclude inheriting it from some other specification)
> > > Otherwise you would not able to e.g. exchange keys between different
> > > platforms.
> > >
> >
> > So what exactly are you suggesting? That the skcipher API should
> > specify that the key is u8[], unless the algo in question operates on
> > 32-bit words, in which case it is le32[], unless the algo in question
> > operates on 64-bit words, in which case it is le64[] etc etc? Do you
> > seriously think that at the skcipher API level we should mandate all
> > of that? That is insane.
> >
> You think being _clear_ on the actual byte order is _insane_? Seriously?

That is not what I said.

I said that an abstract API that reasons about unspecified algorithms
taking inputs and output of an unspecified nature, using keys of an
unspecified nature should not be expected to define how such
unspecified quantities are organized if they happen to be interpreted
as multi-byte words by some of its implementations.

> Like I said, inheriting that from the algorithm spec is fine but not all
> algorithm specs are clear and unambiguous w.r.t. byte order.
>

That doesn't make it the job of the abstraction that is layered on top
to define it.

> I know this is not crypto perse, but what would be the logical byte order
> for a CRC32 "key" considering it is a 32 bit word? The CRC32 definition
> sure isn't going to help you there, it is specified on 32 bit words only.
> So in such cases it must be clear how the byte stream maps onto the word.
>

Yes, so for the shash implementations of "crc32" and "crc32c", it is
defined by the implementation. But that doesn't mean shash should
specify that this is the case for all algorithms.

> > > >
> > > > > ... and from 20 years of
> > > > > experience I would say many algorithm specifications are not exactly
> > > > > very clear on the byte order at all, often assuming this to be
> > > > > "obvious". (and if it's not little-endian, it's not obvious to me ;-)
> > > > >
> > > >
> > > > I agree that not all specs are crystal clear on this. But it is still
> > > > the algorithm that needs to define this.
> > > >
> > > In an ideal world, probably. In the real world, it is entirely possible
> > > for an implementation to expect the key bytes in a different order.
> > > Would not be the first time I run into that.
> > >
> >
> > Of course. But that is not the point. The skcipher API cannot possibly
> > reason about byte orders of all current and future algorithms that it
> > may ever encapsulate.
> >
> Well, so far I got the impression that at least the intention is
> to follow the cipher specification. That's a start. You could add
> some generic rules on top of that, like "if the specification is
> word based and does not mandate a byte order, than these words
> shall be stored in little-endian byte order". It's not that hard.
> You really don't need to know "future algorithm" details for that.
>

I am not saying it is hard. I am saying it is wrong. The skcipher
API's job is not to fill holes in the algorithm's specification.