Re: [RFC PATCH 00/11] nvme: In-band authentication support

Hannes Reinecke <hare@xxxxxxx> · Wed, 21 Jul 2021 08:06:58 +0200

On 7/20/21 10:26 PM, Vladislav Bolkhovitin wrote:
> Hi,
> 
> Great to see those patches coming! After some review, they look to be
> very well done. Some comments/suggestions below.
> 
> 1. I strongly recommend to implement DH exponentials reuse (g x mod p /
> g y mod p as well as g xy mod p) as specified in section 8.13.5.7
> "DH-HMAC-CHAP Security Requirements". When I was working on TP 8006 I
> had a prototype that demonstrated that DH math has quite significant
> latency, something like (as far as I remember) 30ms for 4K group and few
> hundreds of ms for 8K group. For single connection it is not a big deal,
> but imagine AMD EPYC with 128 cores. Since all connections are created
> sequentially, even with 30 ms per connection time to complete full
> remote device connection would become 128*30 => almost 4 seconds. With
> 8K group it might be more than 10 seconds. Users are unlikely going to
> be happy with this, especially in cases, when connecting multiple of
> NVMe-oF devices is a part of a server or VM boot sequence.
> 
Oh, indeed, I can confirm that. FFDHE calculations are quite time-consuming.
But incidentally, ECDH and curve25519 are reasonably fast, so maybe
there _is_ a value in having a TPAR asking for them to be specified, too ...

> If DH exponential reuse implemented, for all subsequent connections the
> DH math is excluded, so authentication overhead becomes pretty much
> negligible.
> 
> In my prototype I implemented DH exponential reuse as a simple
> per-host/target cache that keeps DH exponentials (including g xy mod p)
> for up to 10 seconds. Simple and sufficient.
> 

Frankly, I hadn't looked at exponential reuse; this implementation
really is just a first step to get feedback from people if this is a
direction they want to go.

> Another, might be ever more significant reason why DH exponential reuse
> is important is that without it x (or y on the host side) must always be
> randomly generated each time a new connection is established. Which
> means, for instance, for 8K groups for each connection 1KB of random
> bytes must be taken from the random pool. With 128 connections it is now
> 128KB. Quite a big pressure on the random pool that DH exponential reuse
> mostly avoids.
> 
> Those are the 2 reasons why we added this DH exponential reuse sentence
> in the spec. In the original TP 8006 there was a small informative piece
> explaining reasonings behind that, but for some reasons it was removed
> from the final version.
> 

Thanks for the hint. I'll be adding exponential reuse to the code.

> 2. What is the status of this code from perspective of stability in face
> of malicious host behavior? Seems implementation is carefully done, but,
> for instance, at the first look I was not able to find a code to clean
> up if host in not acting for too long in the middle of exchange. Other
> observation is that in nvmet_execute_auth_send()
> nvmet_check_transfer_len() does not check if tl size is reasonable,
> i.e., for instance, not 1GB.
> 

That is true; exchange timeouts are missing. Will be adding them, of
course. And haven't thought of checking for tl size overflows; will be
adding them, too.

> For sure, we don't want to allow remote hosts to hang or crash target.
> For instance, because of OOM conditions that happened, because malicious
> host asked target to allocate too much memory or open to many being
> authenticated connections in which the host is not going to reply in the
> middle of exchange.
> 
This is something I'll need to look at, anyway. What we do not want is a
userspace application chipping in and send a 'negotiate' command without
any subsequent steps, thereby invalidating the existing authentication.

> Asking, because don't want to go in my review too far ahead from the
> author ;)
> 
> In this regard, it would be great if you add in your test application
> ability to perform authentication with random parameters and randomly
> stop responding. Overnight running of such test would give us good
> degree of confidence that it will always work as expected.
> 

That indeed would be good; let me think on how something like that can
be implemented.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@xxxxxxx			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer