Re: [PATCH v6 0/5] /dev/random - a new approach

Stephan Mueller <smueller@xxxxxxxxxx> · Fri, 12 Aug 2016 11:34:55 +0200

Am Donnerstag, 11. August 2016, 17:36:32 CEST schrieb Theodore Ts'o:

Hi Theodore,

> On Thu, Aug 11, 2016 at 02:24:21PM +0200, Stephan Mueller wrote:
> > The following patch set provides a different approach to /dev/random 
which
> > I call Linux Random Number Generator (LRNG) to collect entropy within the
> > Linux kernel. The main improvements compared to the legacy /dev/random is
> > to provide sufficient entropy during boot time as well as in virtual
> > environments and when using SSDs. A secondary design goal is to limit the
> > impact of the entropy collection on massive parallel systems and also
> > allow the use accelerated cryptographic primitives. Also, all steps of
> > the entropic data processing are testable. Finally massive performance
> > improvements are visible at /dev/urandom and get_random_bytes.
> > 
> > The design and implementation is driven by a set of goals described in 
[1]
> > that the LRNG completely implements. Furthermore, [1] includes a
> > comparison with RNG design suggestions such as SP800-90B, SP800-90C, and
> > AIS20/31.
> 
> Given the changes that have landed in Linus's tree for 4.8, how many
> of the design goals for your LRNG are still left not yet achieved?

The core concerns I have at this point are the following:

- correlation: the interrupt noise source is closely correlated to the HID/
block noise sources. I see that the fast_pool somehow "smears" that 
correlation. However, I have not seen a full assessment that the correlation 
is gone away. Given that I do not believe that the HID event values (key 
codes, mouse coordinates) have any entropy -- the user sitting at the console 
exactly knows what he pressed and which mouse coordinates are created, and 
given that for block devices, only the high-resolution time stamp gives any 
entropy, I am suggesting to remove the HID/block device noise sources and 
leave the IRQ noise source. Maybe we could record the HID event values to 
further stir the pool but do not credit it any entropy. Of course, that would 
imply that the assumed entropy in an IRQ event is revalued. I am currently 
finishing up an assessment of how entropy behaves in a VM (where I hope that 
the report is released). Please note that contrary to my initial 
expectations, the IRQ events are the only noise sources which are almost 
unaffected by a VMM operation. Hence, IRQs are much better in a VM 
environment than block or HID noise sources.

- entropy estimate: the current entropy heuristics IMHO have nothing to do 
with the entropy of the data coming in. Currently, the min of first/second/
third derivative of the Jiffies time stamp is used and capped at 11. That 
value is the entropy value credited to the event. Given that the entropy 
rests with the high-res time stamp and not with jiffies or the event value, I 
think that the heuristic is not helpful. I understand that it underestimates 
on average the available entropy, but that is the only relationship I see. In 
my mentioned entropy in VM assessment (plus the BSI report on /dev/random 
which is unfortunately written in German, but available in the Internet) I 
did a min entropy calculation based on different min entropy formulas 
(SP800-90B). That calculation shows that we get from the noise sources is 
about 5 to 6 bits. On average the entropy heuristic credits between 0.5 and 1 
bit for events, so it underestimates the entropy. Yet, the entropy heuristic 
can credit up to 11 bits. Here I think it becomes clear that the current 
entropy heuristic is not helpful. In addition, on systems where no high-res 
timer is available, I assume (I have not measured it yet), the entropy 
heuristic even overestimates the entropy.

- albeit I like the current injection of twice the fast_pool into the 
ChaCha20 (which means that the pathological case where the collection of 128 
bits of entropy would result in an attack resistance of 2 * 128 bits and 
*not* 2^128 bits is now increased to an attack strength of 2^64 * 2 bits), /
dev/urandom has *no* entropy until that injection happens. The injection 
happens early in the boot cycle, but in my test system still after user space 
starts. I tried to inject "atomically" (to not fall into the aforementioned 
pathological case trap) of 32 / 112 / 256 bits of entropy into the /dev/
urandom RNG to have /dev/urandom at least seeded with a few bits before user 
space starts followed by the atomic injection of the subsequent bits.

A minor issue that may not be of too much importance: if there is a user 
space entropy provider waiting with select(2) or poll(2) on /dev/random (like 
rngd or my jitterentropy-rngd), this provider is only woken up when somebody 
pulls on /dev/random. If /dev/urandom is pulled (and the system does not 
receive entropy from the add*randomness noise sources), the user space 
provider is *not* woken up. So, /dev/urandom spins as a DRNG even though it 
could use a topping off of its entropy once in a while. In my jitterentropy-
rngd I have handled the situation that in addition to a select(2), the daemon 
is woken up every 5 seconds to read the entropy_avail file and starts 
injecting data into the kernel if it falls below a threshold. Yet, this is a 
hack. The wakeup function in the kernel should be placed at a different 
location to also have /dev/urandom benefit from the wakeup.
> 
> Reading the paper, you are still claiming huge performance
> improvements over getrandomm and /dev/urandom.  With the use of the
> ChaCha20 (and given that you added a ChaCha20 DRBG as well), it's not
> clear this is still an advantage over what we currently have.

I agree that with your latest changes, the performance of /dev/urandom is 
comparatively to my implementation, considering the tables 6 and 7 in my 
report. Although the speed of my ChaCha20 DRNG is faster for large block 
sizes (470 vs 210 MB/s for 4096 byte blocks), you rightfully state that the 
large block sizes do not really matter and hence I am not really using it for 
comparison. 

The table 6 and 7 reference the old /dev/urandom using still the SHA-1.
> 
> As far as whether or not you can gather enough entropy at boot time,
> what we're really talking about how how much entropy we want to assume
> can be gathered from interrupt timings, since what you do in your code
> is not all that different from what the current random driver is

Correct. I think I am doing exactly what you do regarding the entropy 
collection minus the caveats mentioned above.

> doing.  So it's pretty easy to turn a knob and say, "hey presto, we
> can get all of the entropy we need before userspace starts!"  But
> justifying this is much harder, and using statistical tests isn't
> really sufficient as far as I'm concerned.

I agree that statistics is one hint only. But as of now I have not seen any 
real explanation why an IRQ event measured with a high-res timer should not 
have 1 bit or 0.5 bits of entropy on average. All my statistical measurements 
(see my LRNG paper, see with my hopefully released VM assessment paper) show 
that the statistical measurement indicates that each high-res time stamp of 
an IRQ has more 4 bits of entropy at least when the system is under attack. 
Both one bit or 0.5 bits is more than enough to have a properly working /dev/
random even in virtual environments, embedded systems, headless systems, 
systems with SSDs, systems using a device mapper, etc. All those type of 
systems are currently subject to heavy penalties because of the collision 
problem I mentioned in the first bullet above.

Finally, one remark which I know you could not care less: :-) 

I try to use a known DRNG design that a lot of folks have already assessed -- 
SP800-90A (and please, do not hint to the Dual EC DRBG as this issue was 
pointed out already by researcher shortly after the first SP800-90A came out 
in 2007). This way I do not need to re-invent the wheel and potentially 
forget about things that may be helpful in a DRNG. To allow researchers to 
assess my ChaCha20 DRNG. that used when no kernel crypto API is compiled. 
independently from the kernel, I extracted the ChaCha20 DRNG code into a 
standalone DRNG accessible at [1]. This standalone implementation can be 
debugged and studied in user space. Moreover it is a simple copy of the 
kernel code to allow researchers an easy comparison.

[1] http://www.chronox.de/chacha20_drng.html

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html