RE: another testmgr question

Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxxxx> · Mon, 27 May 2019 12:22:13 +0000

>
> I understand that. But even if the application is synchronous, it does
> not mean that the whole world stops and nothing is using the
> accelerator in the mean time.
>
I understand that as well. But that doesn't change the fact that the
application may be waiting for a loooooong (relatively speaking) time
for it's results. As latency through hardware may be several orders of
a magnitude larger than the time it actually takes to *process* the
request.  So when used synchronously the HW may appear to work at a mere
fraction of its true performance.

And if your main interest is in that application, you may not care so
much about what the rest of the system does, even if it can use the
remaining bandwidth of the accelerator.

In which case it may be desirable *not* to use the accelerator for that
application at all due to *very* poor performance (for that application).

Which would make even more cycles on the accelerator available to the
other applications in the system, so that knife cuts both ways ...

> > > This is made worse by the priority scheme, which does not really
> > > convery information like this.
> > >
> > Yes, the priority scheme is far too simplistic to cover all details
> > regarding hardware acceleration. Which why we probably shouldn't use
> > it to select hardware drivers at all.
> >
> > > > But then again that would still be too simplistic to select to best
> > > > driver under all possible circumstances ... so why even bother.
> > > >
> > > > > flag for that. But even if that does happen, it doesn't mean you can
> > > > > stop caring about zero length inputs :-)
> > > > >
> > > > If the selection of the hardware driver becomes explicit and not
> > > > automatic, you could argue for a case where the driver does NOT have
> > > > to implement all dark corners of the API. As, as a hardware vendor,
> > > > we could simply recommend NOT to use it for application XYZ  because
> > > > it does things - like zero length messages - we don't support.
> > > >
> > >
> > > Spoken like a true h/w guy :-)
> > >
> > Guilty as charged. I AM a true H/W guy and not a software engineer at all.
> > But have you ever stopped to wonder WHY all hardware guys talk like that?
> > Maybe, just maybe, they have a damn good reason to do so ...
> >
>
> Of course. And so do we. And that is why we meet in the middle to compromise.
>
Yes, we try where we can. But you have to remember that ultimately hardware
is bound by the limitations of the physical world. Which doesn't compromise :-)
And compromises have consequences that need to be carefully considered.

Adding tons of workarounds to drivers, for example, slows them down, makes them
use more CPU cycles and more power, and ultimately defeats the purpose of having
a hardware accelerator at all. That is actually my concern.
And as an aside, once workarounds have been implemented and proven to "work", the
underlying issue rarely makes it to the HW guys so we're stuck with it forever.

> > > Our crypto s/w stack and the storage, networking and other subsystems
> > > that are layered on top of it are complex enough that we shouldn't try
> > > to cater for non-compliant hardware. This is why you need to fix this
> > > in your driver: to prevent the issue from leaking into other layers,
> > > making it even more difficult to do testing and validation.
> > >
> > Now where am I suggesting that applications should cater for non-compliant
> > hardware? I'm simply suggesting that you should NOT use the hardware for
> > such an application at all. If you make it explicit, you can do that.
> >
> > And besides, who decides what is "compliant" and what the rules are?
>
> If the algorithm in question is defined for zero length inputs, but
> the h/w chooses not to implement that case, I think non-compliant is a
> rather nice way to say 'broken'.
>
NO. Hardware is broken if it doesn't comply to its own specifications -
which *may* include references to industry standards it must comply with.
If I intentionally specify that zero length hashes are not supported, and
I don't pretend to comply with any industry standard that requires them,
then that's just a *limitation* of the hardware, most certainly not a bug.
Which may be perfectly valid as hardware is usually created for specific
use cases.
In the case of the Inside Secure HW/driver: mainly IPsec and perhaps disk
encryption, but certainly not Ye Olde's basic random crypto request.

Hardware necessarily *always* has limitations because of all kinds of
constraints: area, power, complexity. And even something as mundane as a
schedule constraint where you simply can't fit all desired features in the
desired schedule. Which is usually very solid due to timeslots being
planned in a fab etc. We don't have the luxury of extending our schedule
forever like SW guys tend to do ... we're very proud of our track record
of always meeting our promised schedules. Plus - silicon can't be patched,
so what's done is done and you have to live with it. For many years to
come, usually.

> I know there is a gradient here going
> from hashes, AEADs to symmetric ciphers, but I think this applies to
> all of them.
>
> > Please keep in mind that existing hardware cannot be changed. So why
> > wasn't the API designed around the limitations of *existing* hardware?
>
> From a software point of view, adding special cases for zero length
> inputs amounts to what you are trying to avoid: using more 'silicon
> area'.
>
No, that's actually not the reason at all in this case. We're trying to
avoid significant extra complexity and effort on both the hardware itself
and the verification thereof. Silicon area is not even in the picture as
a concern for something as "small" as this.

Adding zero length support to our hardware architecture is not a trivial
exercise. And then you have to weigh added complexity - =added risk, when
you talk about hardware with multi-million dollar mask sets in play -
against usefulness. Zero-length support was - and still is! - simply not
worth the added risk and effort.

> Proper validation requires coverage based testing, i.e., that all
> statements in a program can be proven to be exercised by some use
> case, and produce the correct result.
>
> This means that, if we have to add 'if (message_length > 0) { do this;
> } else { do that; }' everywhere, we are moving the effort from your
> corner to mine. Of course I am going to oppose to that :-)
>
> > It can take several years for a hardware fix to reach the end user ...
> >
>
> While software implementations can sometimes be fixed quickly,
> software APIs have *really* long lifetimes as well, especially in the
> server space. And until you have reached sufficient coverage with your
> updated API, you are stuck with both the old one and the new one, so
> you have even more code to worry about.
>
> So a crypto API where zero length inputs are not permitted or treated
> specially is not the way to fix this.
>
Well, for one thing even FIPS certification allows zero lengths not to be
supported by an implementation. So there's definitely prior art to that.
You could handle this by means of capability flags or profiles or whatever.
But I was not even going that far in my suggestions.

I was merely suggesting that IF a driver needs to be explicitly selected to
be used, THEN you could allow that driver to be not fully compliant to some
extent. And then the driver could come with a README or so - maintained by
the HW vendor - detailing which use cases have actually been validated with
it.

> > As for testing and validation: if the selection is explicit, then the
> > responsibility for the testing and validation can move to the HW vendor.
> >
>
> I think the bottom line is still to fix the driver and be done with
> it. I honestly don't care about what exactly your h/w supports, as
> long as the driver that encapsulates it addresses the impedance
> mismatch between what the h/w layer provides and what the upper layer
> expects.
>
And if you go that naive route, just fix everything in the driver, then
you simply end up with something terribly inefficient because all those
corner case checks end up in the fast path and eating up code space.

For a someone claiming to "meet in the middle to compromise" you're
surely not compromising anything at all ... No offense.

Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Inside Secure
www.insidesecure.com