Re: arc4random - are you sure we want these?

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Sun, 24 Jul 2022 00:54:16 +0200

Hi Adhemerval,

Thanks for your reply.

On Sat, Jul 23, 2022 at 02:39:29PM -0300, Adhemerval Zanella Netto wrote:
> > Firstly, for what use cases does this actually help? As of recent
> > changes to the Linux kernels -- now backported all the way to 4.9! --
> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
> > states locklessly. Sure you avoid a syscall by doing that in userspace,
> > but does it really matter? Who exactly benefits from this?
> 
> Mainly performance, since glibc both export getrandom and getentropy. 

Okay so your motivation is performance. But can you tell me what your
performance goals actually are? All kernel.org stable kernels from 4.9
and upwards now have really fast per-cpu lockless implementations of
getrandom() and /dev/urandom. If your goal is performance, I would be
very, very interested to find out a circumstance where this is
insufficient.

> There were some discussion on maillist and we also decided to explicit
> state this is not a CSRNG on our documentation.

Okay that's all the more reason why this is a completely garbage
endeavor. Sorry for the strong language, but the last thing anybody
needs is another PRNG that's "half way" between being good for crypto
and not. If it's not good for crypto, people will use it anyway,
especially since you're winking at them saying, "oh but actually
chacha20 is fine technically so....", and then fast-forward a few years
when you realize you can lean on your non-crypto commitment and make
things different. Never underestimate the power of a poorly defined
function definition. If your goal isn't to make a real CSPRNG, why make
this kind of thing at all?

And it's especially ridiculous since the OpenBSD arc4random *is* used
for crypto. So now you've really muddied the waters. (And naturally the
OpenBSD arc4random was done in conjunction with their kernel
development, since the same people work on both, which isn't what's
happened here.)

So your "it's a CSPRNG wink wink but the documentation says not, so
actually we're off the hook for doing this well" is a cop-out that will
lead to trouble.

Going back to my original point: what are the performance requirements
that point toward a userspace RNG being required here? If it's not
actually necessary, then let's not do this. If it is necessary for some
legitimate widespread reason, then let's do this right, and actually
make something you're comfortable calling cryptographically secure. And
let's get this right from the beginning, so that the new interface
doesn't come with all sorts of caveats, "this is safe for glibc ≥ 
4.3.2.1 only", or whatever else.

Again, I'm not adverse to the general concept. I just haven't seen
anything really justifying adding the complexity for it. And then
assuming that justification does exist somewhere, this approach doesn't
seem to be a particularly well planned one. As soon as you find yourself
reaching for the "documentation cop-out", something has gone amiss.

> The vDSO approach would be good think and if even the kernel provides it
> I think it would feasible to wire-up arc4random to use it if the  underlying
> kernel supports it.

So if you justify the performance requirement, wouldn't it make more
sense to just back getrandom() itself with a vDSO call? So that way,
kernels with that get bits faster (but by how much, really? c'mon...),
and kernels without it have things as normal as possible.

If your concern is instances in which getrandom() can fail, I'd like to
here what those concerns are so that interface can be fixed and
improved.

> But in the end I think if we are clear about in on the documentation,
> and provide alternative when the users are aware of the limitation, I do
> not think it is bad decision.

This really strikes me as an almost comically ominous expectation.
Design interfaces that don't have dangerous pitfalls. While
documentation might somehow technically absolve you of responsibility,
it doesn't actually help make the ecosystem safer by providing optimal
interfaces that don't have cop outs.

Anyway, to reiterate:

- Can you show me some concerning performance numbers on the current
  batch of kernel.org stable kernels, and the use cases for which those
  numbers are concerning, and how widespread you think those use cases
  are?

- If this really *is* necessary for some reason, can we do it well out
  of the gate, with good coordination between kernel and userland,
  instead of half-assing it initially and covering that up with a
  documentation note?

Jason