Currently, if fips_enabled is set, a per-IRQ min-entropy estimate of either 1 bit or 1/8 bit is assumed, depending on whether a high resolution get_cycles() is available or not. The statistical NIST SP800-90B startup health tests are run on a certain amount of noise samples and are intended to reject in case this hypothesis turns out to be wrong, i.e. if the actual min-entropy is smaller. As long as the startup tests haven't finished, entropy dispatch and thus, the initial crng seeding, is inhibited. On test failure, the startup tests would restart themselves from the beginning. It follows that in case a system's actual per-IRQ min-entropy is smaller than the more or less arbitrarily assessed 1 bit or 1/8 bit resp., there will be a good chance that the initial crng seed will never complete. AFAICT, such a situation could potentially prevent certain userspace daemons like OpenSSH from loading. In order to still be able to make any progress, make add_interrupt_randomness() lower the per-IRQ min-entropy by one half upon each health test failure, but only until the minimum supported value of 1/64 bits has been reached. Note that health test failures will cause a restart of the startup health tests already and thus, a certain number of additional noise samples resp. IRQ events will have to get examined by the health tests before the initial crng seeding can take place. This number of fresh events required is reciprocal to the estimated per-IRQ min-entropy H: for the Adaptive Proportion Test (APT) it equals ~128 / H. It follows that this patch won't be of much help for embedded systems or VMs with poor IRQ rates at boot time, at least not without manual intervention. But there aren't many options left when fips_enabled is set. With respect to NIST SP800-90B conformance, this patch enters kind of a gray area: NIST SP800-90B has no notion of such a dynamically adjusted min-entropy estimate. Instead, it is assumed that some fixed value has been estimated based on general principles and subsequently validated in the course of the certification process. However, I would argue that if a system had successfully passed certification for 1 bit or 1/8 bit resp. of estimated min-entropy per sample, it would automatically be approved for all smaller values as well. Had we started out with such a lower value passing the health tests from the beginning, the latter would never have complained in the first place and the system would have come up just fine. Finally, note that all statistical tests have a non-zero probability of false positives and so do the NIST SP800-90B health tests. In order to not keep the estimated per-IRQ entropy at a smaller level than necessary for forever after spurious health test failures, make add_interrupt_randomness() attempt to double it again after a certain number of successful health test passes at the degraded entropy level have been completed. This threshold should not be too small in order to avoid excessive entropy accounting loss due to continuously alternating between a too large per-IRQ entropy estimate and the next smaller value. For now, choose a value of five as a compromise between quick recovery and limiting said accounting loss. So, introduce a new member ->good_tests to struct fast_pool for keeping track of the number of successfult health test passes. Make add_interrupt_randomness() increment it upon successful healh test completion and reset it to zero on failures. Make add_interrupt_randomness() double the current min-entropy estimate and restart the startup health in case ->good_tests is > 4 and the entropy had previously been lowered. Signed-off-by: Nicolai Stange <nstange@xxxxxxx> --- drivers/char/random.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index bb79dcb96882..24c09ba9d7d0 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1126,6 +1126,7 @@ struct fast_pool { bool dispatch_needed : 1; bool discard_needed : 1; int event_entropy_shift; + unsigned int good_tests; struct queued_entropy q; struct health_test health; }; @@ -1926,9 +1927,13 @@ void add_interrupt_randomness(int irq, int irq_flags) cycles); if (unlikely(health_result == health_discard)) { /* - * Oops, something's odd. Restart the startup - * tests. + * Oops, something's odd. Lower the entropy + * estimate and restart the startup tests. */ + fast_pool->event_entropy_shift = + min_t(unsigned int, + fast_pool->event_entropy_shift + 1, 6); + fast_pool->good_tests = 0; health_test_reset(&fast_pool->health, fast_pool->event_entropy_shift); } @@ -1951,6 +1956,7 @@ void add_interrupt_randomness(int irq, int irq_flags) * entropy discard request? */ fast_pool->dispatch_needed = !fast_pool->discard_needed; + fast_pool->good_tests++; break; case health_discard: @@ -2005,6 +2011,21 @@ void add_interrupt_randomness(int irq, int irq_flags) if (fast_pool->dispatch_needed || health_result == health_none) { reseed = __dispatch_queued_entropy_fast(r, q); fast_pool->dispatch_needed = false; + + /* + * In case the estimated per-IRQ min-entropy had to be + * lowered due to health test failure, but the lower + * value has proven to withstand the tests for some + * time now, try to give the next better value another + * shot. + */ + if (unlikely((fast_pool->event_entropy_shift > + min_irq_event_entropy_shift())) && + fast_pool->good_tests > 4) { + fast_pool->event_entropy_shift--; + health_test_reset(&fast_pool->health, + fast_pool->event_entropy_shift); + } } else if (fast_pool->discard_needed) { int dummy; -- 2.26.2