Hi Florian, On Tue, Jul 26, 2022 at 11:55:23AM +0200, Florian Weimer wrote: > * Jason A. Donenfeld: > > > + pfd.fd = TEMP_FAILURE_RETRY ( > > + __open64_nocancel ("/dev/random", O_RDONLY | O_CLOEXEC | O_NOCTTY)); > > + if (pfd.fd < 0) > > + arc4random_getrandom_failure (); > > + if (__poll (&pfd, 1, -1) < 0) > > + arc4random_getrandom_failure (); > > + if (__close_nocancel (pfd.fd) < 0) > > + arc4random_getrandom_failure (); > > What happens if /dev/random is actually /dev/urandom? Will the poll > call fail? Yes. I'm unsure if you're asking this because it'd be a nice simplification to only have to open one fd, or because you're worried about confusion. I don't think the confusion problem is one we should take too seriously, but if you're concerned, we can always fstat and check the maj/min. Seems a bit much, though. > I think we need a no-cancel variant of poll here, and we also need to > handle EINTR gracefully. Thanks for the note about poll nocancel. I'll try to add this. I don't totally know how to manage that pluming, but I'll give it my best shot. > Performance-wise, my 1000 element shuffle benchmark runs about 14 times > slower without userspace buffering. (For comparison, just removing > ChaCha20 while keeping a 256-byte buffer makes it run roughly 25% slower > than current master.) Our random() implementation is quite slow, so > arc4random() as a replacement call is competitive. The unbuffered > version, not so much. Yes, as mentioned, this is slower. But let's get something down first that's *correct*, and then after we can start optimizing it. Let's not prematurely optimize and create a problematic function that nobody should use. > Running the benchmark, I see 40% of the time spent in chacha_permute in > the kernel, that is really quite odd. Why doesn't the system call > overhead dominate? Huh, that is interesting. I guess if you're reading 4 bytes for an integer, it winds up computing a whole chacha block each time, with half of it doing fast key erasure and half of it being returnable to the caller. When we later figure out a safer way to buffer, ostensibly this will go away. But for now, we really should not prematurely optimize. I'll have v3 out shortly with your suggested fixes. Jason