Re: [PATCH] hwrng: core - fix task hung in hwrng_fillfn

Edward Adam Davis <eadavis@xxxxxx> · Fri, 1 Dec 2023 19:37:39 +0800

On Fri, 1 Dec 2023 17:56:59 +0800, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
> > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> > index 420f155d251f..7323ddc958ce 100644
> > --- a/drivers/char/hw_random/core.c
> > +++ b/drivers/char/hw_random/core.c
> > @@ -225,17 +225,18 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
> >                        goto out;
> >                }
> >
> > -               if (mutex_lock_interruptible(&reading_mutex)) {
> > -                       err = -ERESTARTSYS;
> > -                       goto out_put;
> > -               }
> >                if (!data_avail) {
> > +                       if (mutex_lock_interruptible(&reading_mutex)) {
> > +                               err = -ERESTARTSYS;
> > +                               goto out_put;
> > +                       }
> >                        bytes_read = rng_get_data(rng, rng_buffer,
> >                                rng_buffer_size(),
> >                                !(filp->f_flags & O_NONBLOCK));
> > +                       mutex_unlock(&reading_mutex);
> >                        if (bytes_read < 0) {
> >                                err = bytes_read;
> > -                               goto out_unlock_reading;
> > +                               goto out_put;
> >                        }
> >                        data_avail = bytes_read;
> >                }
> 
> Does this change anything at all? Please explain why it was holding
> this lock for 143 seconds in the first place.  If it's doing it in
> rng_get_data, then your change has zero effect.
Reduce the scope of critical zone protection.
The original critical zone contains a too large range, especially like 
copy_to_user() should not be included in the critical zone.
> 
> > @@ -501,7 +499,10 @@ static int hwrng_fillfn(void *unused)
> >                rng = get_current_rng();
> >                if (IS_ERR(rng) || !rng)
> >                        break;
> > -               mutex_lock(&reading_mutex);
> > +               if (mutex_lock_interruptible(&reading_mutex)) {
> > +                       put_rng(rng);
> > +                       return -ERESTARTSYS;
> > +               }
> 
> No this is just the symptom.  The real problem is why is the driver
> spending 143 seconds in rng_get_data?
In the second version of the patch, I have removed the fix in hwrng_fillfn().
But for some reason, the V2 patch did not appear in the mailing list.

I think it was due to consuming too much time while executing copy_to_user() 
that resulted in 143s.
So, I narrowed down the scope of the critical area and moved the code 
copy_to_user() out of the critical area.

Edward