Re: [PATCH v2] arc4random: simplify design for better safety

Adhemerval Zanella Netto <adhemerval.zanella@xxxxxxxxxx> · Tue, 26 Jul 2022 09:34:57 -0300

On 26/07/22 08:54, Jason A. Donenfeld wrote:
> Hi Adhemerval,
> 
> Thanks for your review.
> 
> On Tue, Jul 26, 2022 at 08:33:23AM -0300, Adhemerval Zanella Netto wrote:
>> Ther are some missing pieces, like sysdeps/unix/sysv/linux/tls-internal.h comment,
>> sysdeps/generic/tls-internal-struct.h generic piece (it is used on hurd build),
>> maybe also change the NEWS to state this is not a CSPRNG, and we definitely need
>> to update the manual. Some comments below.
> 
> I think Eric already pointed those out, and they're fixed in v3 now.
> PTAL.
> 
>>> +  static bool have_getrandom = true, seen_initialized = false;
>>> +  int fd;
>>
>> I think it should reasonable to assume that getrandom syscall will be always
>> supported and using arc4random in an enviroment with filtered getrandom does
>> not make much sense.  We are trying to avoid add this static syscall checks
>> where possible,
> 
> I don't know glibc's requirements for kernels, though I do know that
> it'd be nice to not have to write this fallback code in every program I
> write and just use libc's thing. So in that sense, having the fallback
> to /dev/urandom makes arc4random_buf a lot more useful. But with that
> said, yea, maybe we shouldn't care about old kernels? getrandom is now
> quite old and the stable kernels on kernel.org all have it.

We do not enforce kernels version anymore, although we still support the
--enable-kernel=x.y that changes on how glibc internally assume some syscall
(so there is no need to fallback if it were the case).

So the question is where we need the fallback code for --enable-kernel=3.17.
If kernel is returning ENOSYS in this case (and assuming you are running on
kernel newer than 3.17) it means some syscall filtering, and I am not sure
we should need to actually handle it.  The main idea of adding this minor
optimization is to once we increase the minimum supported kernel we can
clean this code up.

> 
> From my perspective, I don't have a strongly developed opinion on what
> makes sense for glibc. If Florian agrees with you, I'll send a v+1 with
> the fallback code removed. If it's contentious, maybe the fallback code
> should stay in and we can slate it for removal on another day, when the
> minimum glibc kernel version gets raised or something like that.

I think the fallback code make sense since the minimum supported kernel we
still support is 3.2, although I am not sure how getrandom and/or /dev/urandom
will play in such older kernels.

> 
>> also plain load/store to se the static have_getrandom
>> is strickly a race-condition, although it should not really matter (we use
>> relaxed load/store in such optimization (check
>> sysdeps/unix/sysv/linux/mips/mips64/getdents64.c).
> 
> I was aware of the race but figured it didn't matter, since two racing
> threads will both set it to the same result eventually. But I didn't
> know about the convention of using those relaxed wrapper functions.
> Thanks for the tip. I'll do that for v4.
> 
>> Also, does it make sense to fallback if we build for a kernel that should
>> always support getrandom?
> 
> I guess only if syscall filtering is a concern. But if not, then maybe
> yea? We could do this in a follow-up commit, or I could do this in v4.
> Would `#if __LINUX_KERNEL_VERSION >` be the right mechanism to use here?
> If so, I think the way I'd implement that would be:
> 
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> index 978bf9287f..a33d9ff2c5 100644
> --- a/stdlib/arc4random.c
> +++ b/stdlib/arc4random.c
> @@ -44,8 +44,10 @@ __arc4random_buf (void *p, size_t n)
>      {
>        ssize_t l;
> 
> +#if __LINUX_KERNEL_VERSION < something
>        if (!atomic_load_relaxed (&have_getrandom))
>  	break;
> +#endif> 
>        l = __getrandom_nocancel (p, n, 0);
>        if (l > 0)
> @@ -60,11 +62,13 @@ __arc4random_buf (void *p, size_t n)
>  	arc4random_getrandom_failure (); /* Weird, should never happen. */
>        else if (l == -EINTR)
>  	continue; /* Interrupted by a signal; keep going. */
> +#if __LINUX_KERNEL_VERSION < something
>        else if (l == -ENOSYS)
>  	{
>  	  atomic_store_relaxed (&have_getrandom, false);
>  	  break; /* No syscall, so fallback to /dev/urandom. */
>  	}
> +#endif
>        arc4random_getrandom_failure (); /* Unknown error, should never happen. */
>      }
> 
> And then arc4random_getrandom_failure() being a noreturn function would
> make gcc optimize out the rest.
> 
> Does that seem like a good approach?

I think so, although he __LINUX_KERNEL_VERSION is Linux-only that should 
be moved to sysdeps/unix/sysv/linux.

Usually we do as a wrapper (static inline or hidden symbol), with the generic
implementation on sysdep/generic or include with Linux redefining on its own
folder.

We also a use __ASSUME macros (check sysdeps/unix/sysv/linux/kernel-features.h),
it should be something like __ASSUME_GETRANDOM (we did not have a use for it 
because we do not want a fallback for getrandom implementation).

So I would add something like:

sysdeps/unix/sysv/linux/arc4random_impl.h

  static inline int getentropy_arch (void *p, size_t n)
  {
    for (;;)
      {
        ssize_t l = __getrandom_nocancel (p, n, 0);
        if (l > 0)
          {
            if (l == n)
             return true;
          }
        else if (l == 0)
          return -1;
        else if (l == -EINTR)
         continue;

  #if !__ASSUME_GETRANDOM
        if (l == -ENOSYS)
          return 0;
  #endif
        return -1;
      }
    return 1;
  }

And on stdlib/arc4random.c:

  void
  __arc4random_buf (void *p, size_t n)
  {
    if (n == 0)
      return;

    int s = getentropy_arch (p, n);
    if (s > 0)
      return;
    if (s < 0)
      arc4random_getrandom_failure ()

    /* Fallback.  */
  }

> 
>>> +      l = __getrandom_nocancel (p, n, 0);
>>
>> Do we need to worry about a potentially uncancellable blocking call here? I guess
>> using GRND_NONBLOCK does not really help.
> 
> No, generally not. Also, keep in mind that getrandom(0) will trigger
> jitter entropy if the kernel isn't already initialized.

Maybe add a comment stating it.

> 
>>
>>> +      if (l > 0)
>>> +	{
>>> +	  if ((size_t) l == n)
>>
>> Do we need the cast here?
> 
> Generally it's frowned upon to have implicit signed conversion, right? l
> is signed while n is unsigned.

Good question, I don't think we enforce it in fact.

> 
>>
>>> +	    return; /* Done reading, success. */
>>
>> Minor style issue: use double space before period.
> 
> I was really confused by this, and then opened up some other files and
> saw you meant *after* period. :) Will do for v4.

Yeah, I meant after indeed.

> 
>> As Florian said we will need a non cancellable poll here.  Since you are setting
>> the timeout as undefined, I think it would be simple to just add a non cancellable
>> wrapper as:
>>
>>   int __ppoll_noncancel_notimeout (struct pollfd *fds, nfds_t nfds)
>>   {
>>   #ifndef __NR_ppoll_time64
>>   # define __NR_ppoll_time64 __NR_ppoll
>>   #endif
>>      return INLINE_SYSCALL_CALL (__NR_ppoll_time64, fds, nfds, NULL, NULL, 0);
>>   }
>>
>> So we don't need to handle the timeout for 64-bit time_t wrappers.
> 
> Oh that sounds like a good solution to the time64 situation. I'll do
> that for v4... BUT, I already implemented possibly the wrong solution
> for v3. Could you take a look at what I did there and confirm that it's
> wrong? If so, then I'll do exactly what you suggested here.
> 
> Thanks again for the review,
> Jason