Re: [PATCH v5 2/5] iio: consumers: copy/release available info from producer to fix race

Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> · Wed, 30 Oct 2024 17:35:45 +0000

On Wed, 30 Oct 2024 16:47:50 +0200
Andy Shevchenko <andriy.shevchenko@xxxxxxxxx> wrote:

> On Mon, Oct 21, 2024 at 02:54:15PM +0200, Matteo Martelli wrote:
> > Consumers need to call the producer's read_avail_release_resource()
> > callback after reading producer's available info. To avoid a race
> > condition with the producer unregistration, change inkern
> > iio_channel_read_avail() so that it copies the available info from the
> > producer and immediately calls its release callback with info_exists
> > locked.
> > 
> > Also, modify the users of iio_read_avail_channel_raw() and
> > iio_read_avail_channel_attribute() to free the copied available buffers
> > after calling these functions. To let users free the copied buffer with
> > a cleanup pattern, also add a iio_read_avail_channel_attr_retvals()
> > consumer helper that is equivalent to iio_read_avail_channel_attribute()
> > but stores the available values in the returned variable.  
> 
> ...
> 
> > +static void dpot_dac_read_avail_release_res(struct iio_dev *indio_dev,
> > +					    struct iio_chan_spec const *chan,
> > +					    const int *vals, long mask)
> > +{
> > +	kfree(vals);
> > +}
> > +
> >  static int dpot_dac_write_raw(struct iio_dev *indio_dev,
> >  			      struct iio_chan_spec const *chan,
> >  			      int val, int val2, long mask)
> > @@ -125,6 +132,7 @@ static int dpot_dac_write_raw(struct iio_dev *indio_dev,
> >  static const struct iio_info dpot_dac_info = {
> >  	.read_raw = dpot_dac_read_raw,
> >  	.read_avail = dpot_dac_read_avail,
> > +	.read_avail_release_resource = dpot_dac_read_avail_release_res,
> >  	.write_raw = dpot_dac_write_raw,
> >  };  
> 
> I have a problem with this approach. The issue is that we allocate
> memory in one place and must clear it in another. This is not well
> designed thingy in my opinion.

It is a tricky corner and we've not yet come up with a better
solution :(  I think one of the earlier versions did just always copy
and the reviews suggested that was painful given we are fixing a tiny
percentage of devices.  Hence we ended up with what is effectively
an optional copy if the provider knows the data is volatile.

So there are two 'potential' copies here and we need to be careful
to separate them for purposes of discussion.
A) Copy in provider if it has volatile available data.  In that
   case the copy is done in a call to it via read_avail, and release
   via a call to read_avail_release_resource().  So to my mind locally
   the same as any acquire / release pair.
B) Copy in the core for the case where we need the lifetime to persist.
   That is a effectively a kmemdup() call so we could call back to the
   core to release it but it would just be a kfree() wrapper.

(A) Only occurs in a tiny subset of drivers, most use non volatile data
    for read avail (constant, or constant after probe).
(B) Only occurs for consumer drivers that directly use the avail data.
    There are very few of those and no other sane way of solving this
    because we can't hold a lock into the provider for an unknown
    (long) time.

> I was thinking a bit of the solution and
> at least these two comes to my mind:
> 
> 1) having a special callback for .read_avail_with_copy (choose better
> name) that will dump the data to the intermediate buffer and clean it
> after all;

So we have that allocate the data in the provider and hand it to the
consumer which then frees it with kfree() in all cases?

Note that's what we do for the inkern interfaces (the ones consumer
drivers have to use), just in the core not the providers because that
corner is hard to close any other way. In this rare case we end up
potentially copying twice.

For the special cases where the buffer isn't passed on beyond
functions that are part of the IIO core, we avoid the need for that
(potentially second, probably only) copy because we can always
ensure the release call is made.  Note this is the common case
by far. It's the pretty printing done by the core to present this
data to sysfs etc, or to find out the max value for some consumer
that doesn't need the whole set.

> 
> 2) introduce a new type (or bit there), like IIO_AVAIL_LIST_ALLOC.

The special handling that will need seems likely to be no more
obvious than the handling we have here.  I'm not really
sure how it would work.

> 
> In any case it looks fragile and not scalable. I propose to drop this
> and think again.
> 
> Yes, yes, I'm fully aware about the problem you are trying to solve and
> agree on the report, I think this solution is not good enough.

I'll back this out of my tree for now so the discussion can carry on.

Jonathan