Re: [PATCH v5 2/5] iio: consumers: copy/release available info from producer to fix race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 19, 2024 at 12:25:18PM +0100, Matteo Martelli wrote:
> On Mon, 18 Nov 2024 18:05:35 +0200, Andy Shevchenko <andriy.shevchenko@xxxxxxxxx> wrote:
> > On Mon, Nov 18, 2024 at 03:45:25PM +0100, Matteo Martelli wrote:
> > > On Mon, 18 Nov 2024 12:21:44 +0200, Andy Shevchenko <andriy.shevchenko@xxxxxxxxx> wrote:
> > > > On Fri, Nov 15, 2024 at 03:25:06PM +0100, Matteo Martelli wrote:
> > > > > On Thu, 31 Oct 2024 19:06:32 +0100, Matteo Martelli <matteomartelli3@xxxxxxxxx> wrote:
> > > > > > Quoting Jonathan Cameron (2024-10-31 15:31:29)
> > > > > > > On Thu, 31 Oct 2024 12:26:24 +0100
> > > > > > > Matteo Martelli <matteomartelli3@xxxxxxxxx> wrote:
> > > > > > > > Quoting Jonathan Cameron (2024-10-30 21:30:50)
> > > > > > > > > On Wed, 30 Oct 2024 19:23:21 +0100
> > > > > > > > > Matteo Martelli <matteomartelli3@xxxxxxxxx> wrote:
> > > > > > > > > > Quoting Andy Shevchenko (2024-10-30 15:47:50)  
> > > > > > > > > > > On Mon, Oct 21, 2024 at 02:54:15PM +0200, Matteo Martelli wrote:    

> > > > > > > > > > > > Consumers need to call the producer's read_avail_release_resource()
> > > > > > > > > > > > callback after reading producer's available info. To avoid a race
> > > > > > > > > > > > condition with the producer unregistration, change inkern
> > > > > > > > > > > > iio_channel_read_avail() so that it copies the available info from the
> > > > > > > > > > > > producer and immediately calls its release callback with info_exists
> > > > > > > > > > > > locked.
> > > > > > > > > > > > 
> > > > > > > > > > > > Also, modify the users of iio_read_avail_channel_raw() and
> > > > > > > > > > > > iio_read_avail_channel_attribute() to free the copied available buffers
> > > > > > > > > > > > after calling these functions. To let users free the copied buffer with
> > > > > > > > > > > > a cleanup pattern, also add a iio_read_avail_channel_attr_retvals()
> > > > > > > > > > > > consumer helper that is equivalent to iio_read_avail_channel_attribute()
> > > > > > > > > > > > but stores the available values in the returned variable.    

...

> > > > > > > > > > > > +static void dpot_dac_read_avail_release_res(struct iio_dev *indio_dev,
> > > > > > > > > > > > +                                         struct iio_chan_spec const *chan,
> > > > > > > > > > > > +                                         const int *vals, long mask)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +     kfree(vals);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > >  static int dpot_dac_write_raw(struct iio_dev *indio_dev,
> > > > > > > > > > > >                             struct iio_chan_spec const *chan,
> > > > > > > > > > > >                             int val, int val2, long mask)
> > > > > > > > > > > > @@ -125,6 +132,7 @@ static int dpot_dac_write_raw(struct iio_dev *indio_dev,
> > > > > > > > > > > >  static const struct iio_info dpot_dac_info = {
> > > > > > > > > > > >       .read_raw = dpot_dac_read_raw,
> > > > > > > > > > > >       .read_avail = dpot_dac_read_avail,
> > > > > > > > > > > > +     .read_avail_release_resource = dpot_dac_read_avail_release_res,
> > > > > > > > > > > >       .write_raw = dpot_dac_write_raw,
> > > > > > > > > > > >  };    
> > > > > > > > > > > 
> > > > > > > > > > > I have a problem with this approach. The issue is that we allocate
> > > > > > > > > > > memory in one place and must clear it in another. This is not well
> > > > > > > > > > > designed thingy in my opinion. I was thinking a bit of the solution and
> > > > > > > > > > > at least these two comes to my mind:
> > > > > > > > > > > 
> > > > > > > > > > > 1) having a special callback for .read_avail_with_copy (choose better
> > > > > > > > > > > name) that will dump the data to the intermediate buffer and clean it
> > > > > > > > > > > after all;
> > > > > > > > > > > 
> > > > > > > > > > > 2) introduce a new type (or bit there), like IIO_AVAIL_LIST_ALLOC.    
> > > > > > > > > > 
> > > > > > > > > > Could you elaborate more about these potential solutions? Maybe with some
> > > > > > > > > > usage examples?
> > > > > > > > > > 
> > > > > > > > > > If I get it correctly, in both cases you are suggesting to pass ownership
> > > > > > > > > > of the vals buffer to the caller, iio_read_channel_info_avail() in this
> > > > > > > > > > case, so that it would take care of freeing the buffer after calling
> > > > > > > > > > iio_format_after_*(). We considered this approach during an initial
> > > > > > > > > > discussion with Jonathan (see read_avail_ext() in [1]), where he suggested
> > > > > > > > > > to let the driver keep the release control through a callback for two
> > > > > > > > > > reasons:
> > > > > > > > > > 
> > > > > > > > > > 1) Apparently it's a bad pattern to pass the buffer ownership to the core,
> > > > > > > > > >    maybe Jonathan can elaborate why? The risk I can think of is that the driver
> > > > > > > > > >    could still keep the buffer copy in its private data after giving it away,
> > > > > > > > > >    resulting in fact in a double ownership. However I think it would be clear
> > > > > > > > > >    enough in this case that the copy should be handled by the caller, or maybe
> > > > > > > > > >    not?  
> > > > > > > > > Mostly the lack of desire to have to copy for the 95% of cases where it's
> > > > > > > > > not needed and that it prevents any optimization like you mention.  
> > > > > > > > 
> > > > > > > > I think the suggestion here is to add an additional .read_avail_with_copy()
> > > > > > > > without replacing the original .read_avail(), so all the current drivers that
> > > > > > > > use a constant avail list would not be affected.

> > > > Yes.

> > > > > > > > And I think this was the same
> > > > > > > > idea for the additional read_avail_ext() or the additional argument for the
> > > > > > > > read_avail() we were considering in [1]. So I would think that
> > > > > > > > iio_read_channel_info_avail() would do something like the following:
> > > > > > > > 
> > > > > > > >     if (indio_dev->info->read_avail_with_copy)
> > > > > > > >         indio_dev->info->read_avail_with_copy(vals);
> > > > > > > >     else
> > > > > > > >         indio_dev->info->read_avail(vals);
> > > > > > > > 
> > > > > > > >     ...
> > > > > > > >     iio_format_avail_list(vals);
> > > > > > > >     ...
> > > > > > > > 
> > > > > > > >     if (indio_dev->info->read_avail_with_copy)
> > > > > > > >         kfree(vals);
> > > > 
> > > > Right. At least that's what I see can be done with the existing users.
> > > > 
> > > > > > > Ok, sure that would work, but...
> > > > > > > 
> > > > > > > I don't really see this as being much less fragile than
> > > > > > > the existing solution + in cases that we do have where
> > > > > > > only some available are not const we will have to copy them
> > > > > > > all.
> > > > > > > 
> > > > > > > If anything it's more complex than making it a driver problem
> > > > > > > to provide the release call however it wants to do it.
> > > > 
> > > > ...but make a driver to allocate what's needed as well then.
> > > > 
> > > > > > > > And the drivers would choose whether to define the read_avail or the
> > > > > > > > read_avail_with_copy.
> > > > 
> > > > Either way drivers should know what to do with a data supplied to read_aval().
> > > > In one case we assume the [simple] workflow in the core, in the other we all
> > > > rely on the driver. Current approach makes a mix of these two. And that's what
> > > > I don't like.
> > > 
> > > If I understand your concern correctly, you are referring to the inkern
> > > iio_channel_read_avail() that makes the allocation for the consumer's
> > > buffer copy and you are suggesting that such copy should be done by the
> > > consumer driver code itself, this to be consistent with the producer
> > > drivers which directly handle the allocation of the copy.
> > 
> > One of the options, yes.
> > 
> > > One thing to notice is that the inkern iio_channel_read_avail() does
> > > together producer->read_avail() + copy + producer->read_avail_release()
> > > with info_exists locked. Also, the consumer driver would need to know
> > > the avail buffer size to allocate the buffer copy prior the
> > > iio_channel_read_avail() call, but such size is unknown before calling
> > > the actual producer's read_avail(). This would mean calling the
> > > producer's read_avail() and read_avail_release() callbacks separately
> > > without the lock held, with the risk of a memleak if the producer is
> > > unregistered between those calls.
> > 
> > Thanks for explaining this, but it even more makes me think that the design
> > is broken and your approach is rather a hack. So, what's the problem to
> > make IIO core to take care of the allocating and cleaning then without driver
> > being involved? Yes, this might require a hint from the driver on what to copy
> > if we want to avoid copying everything.
> 
> I am not particularly against it, other than the concerns that have
> emerged during this (and previous) discussion. Let me summarize them:

Thank you for a very good summary and fix-N proposals. I think I have nothing
to add and we should wait for Jonathan to finally choose (or propose a fix-N+1)
here.

> fix-1) the current one. Your concerns are:
>     * for consumers the copy allocation is taken care by the inkern API
>       but the release is handled by the consumer driver code, making it
>       a fragile design.
>     * consumers and producers manage the allocation differently, the
>       first handles it via the inkern API, the second one in the
>       producer driver code, making it inconsistent.
> 
> fix-2) adding a read_avail_with_copy(): a driver with both const avail
> lists and mutable avail lists would always return a copy for all of
> them, including the const ones. Example above.
> 
> fix-3) adding a release_avail return param to read_avail(): this would
> require a change to all the drivers using it. Also it
> looks to me an unusual pattern, are there other similar patterns around
> the codebase? Example below.
> 
> fix-4) adding a new enum variant to the avail type like
> IIO_AVAIL_LIST_ALLOC: to me this looks hacky as it mixes the logic type
> of the data structure and how it is handled in memory. I think the
> latter should better fit in a different field, however this modification
> would have little impact in the current code. Example below.
> 
> So far these alternatives only consider moving the release of the copy
> buffer in the IIO core but not its allocation. You also suggest to make
> the IIO core take care of the copy allocation. The problem I see with
> this is that if the copy is handled outside the driver it could take
> place concurrently with the modification of the original buffer since it
> would not be locked by driver private mutex, thus making the copy
> useless. This might be worked around by adding an additional optional
> callback (e.g. read_avail_will_copy/read_avail_is_mutable) to just take
> the size and check if a copy will be provided, so maybe something like:
> 
> fix-5) iio_read_channel_info_avail():
> {
>     ...
>     int *vals;
>     bool copy = false;
>     if (indio_dev->info->read_avail_will_copy) {
>         copy = indio_dev->info->read_avail_will_copy(..., &length, ...);
>         if (copy) {
>             vals = kcalloc(length, sizeof(int), GFP_KERNEL);
>         }
>     }
> 
>     indio_dev->info->read_avail(&vals, ...);
> 
>     if (ret < 0)
>             return ret;
>     switch (ret) {
>     case IIO_AVAIL_LIST:
>             ret = iio_format_avail_list(buf, vals, type, length);
>     case IIO_AVAIL_RANGE:
>             ret = iio_format_avail_range(buf, vals, type);
>     default:
>             ret = -EINVAL;
>     }
> 
>     if (copy)
>         kfree(vals);
> }
> 
> If I am not missing anything this could work and maybe it could also
> avoid the double copy on the consumers but would require all of them to
> wrap the read_avail_will_copy(). Also, I find it quite a weird pattern
> that in some cases vals would be an input buffer to be filled and in
> other cases it would be a return argument pointing to the const buffer
> stored inside the driver. At least I wouldn't say it's more robust than
> the current fix-1.
> 
> All these alternatives also prevents some potential optimization already
> mentioned before. Reporting it again as it is now lost in the mess below:
>     Some driver might want to avoid allocating a new copy of a big table if
>     the race does not occur (e.g. with additional checks on buffer access
>     code) and thus wouldn't call a free() in the release callback.
> 
> In the end I don't find any of the above alternatives to provide an
> obvious better solution.
> 
> > > > > > > > What I was referring to is that, back then, you mentioned you would have
> > > > > > > > preferred to avoid passing ownership of the buffer around:
> > > > > > > > 
> > > > > > > > > That's a corner case we should think about closing. Would require an indicator
> > > > > > > > > to read_avail that the buffer it has been passed is a snapshot that it should
> > > > > > > > > free on completion of the string building.  I don't like passing ownership
> > > > > > > > > of data around like that, but it is fiddly to do anything else given
> > > > > > > > > any simple double buffering is subject to race conditions.  
> > > > > > > > 
> > > > > > > > I guess there is some other reason other than avoiding the copy when not
> > > > > > > > necessary, since by introducing an additional function or argument or return
> > > > > > > > type, most of the unnecessary copies would already be avoided right?
> > > > > > > 
> > > > > > > It's not a strong reason beyond limiting scope of clever design +
> > > > > > > the key bit my mind is that the above is not substantially simpler and
> > > > > > > reduces our flexibility.
> > > > > > > 
> > > > > > > > Anyway any of this solutions would still prevent the potential optimizations of
> > > > > > > > point 2). It's worth mentioning that those kind of optimizations are currently
> > > > > > > > not adopted by any driver.
> > > > > > > 
> > > > > > > That one indeed not, but mixing dynamic and non dynamic is something
> > > > > > > you do in your pac1921 patch.
> > > > > > 
> > > > > > Good point! I didn't think about it, or more likely I forgot, that with an
> > > > > > additional read_avail_with_copy() used as in the example you cannot mix dynamic
> > > > > > and non dynamic available lists, thus those drivers that need at least one
> > > > > > dynamic available list would always copy all of them as they need to rely to
> > > > > > the read_avail_with_copy(). I guess this could be worked around with an
> > > > > > additional return argument for the read_avail() or an additional type like the
> > > > > > IIO_AVAIL_LIST_ALLOC suggested by Andy to signal the caller it needs to free
> > > > > > the list after use. However, I think they would introduce a more invasive
> > > > > > change in the current API compared to an additional optional callback,
> > > > 
> > > > It even sounds originally that it should be more invasive, so I don't think it's
> > > > a problem here.
> > > 
> > > In the hope it helps the discussion let me provide examples for the
> > > additional two options we have other than the current
> > > read_avail_release_resource() (fix-1) and the read_avail_with_copy()
> > > (fix-2) already shown above:
> > 
> > Thanks!
> > 
> > > fix-3) iio_read_channel_info_avail():
> > > {
> > >     ...
> > >     bool release_avail = false;
> > > 
> > >     ret = indio_dev->info->read_avail(vals, ..., &release_avail);
> > > 
> > >     ...
> > >     ret = iio_format_avail_list(vals, ...);
> > >     ...
> > > 
> > >     if (release_avail)
> > >         kfree(vals);
> > > 
> > >     return ret;
> > > }
> > > 
> > > 
> > > fix-4) iio_read_channel_info_avail():
> > > {
> > >     ...
> > >     indio_dev->info->read_avail(vals, ...);
> > > 
> > >     if (ret < 0)
> > >             return ret;
> > >     switch (ret) {
> > >     case IIO_AVAIL_LIST_ALLOC:
> > >             ret = iio_format_avail_list(buf, vals, type, length);
> > >             kfree(vals);
> > >             return ret;
> > >     case IIO_AVAIL_LIST:
> > >             return iio_format_avail_list(buf, vals, type, length);
> > >     case IIO_AVAIL_RANGE:
> > >             return iio_format_avail_range(buf, vals, type);
> > >     default:
> > >             return -EINVAL;
> > >     }
> > > }
> > > 
> > > > > > so I agree that the current release callback is still a better option.
> > > > 
> > > > I disagree on this as I pointed above why.
> > > > 
> > > > > > > > > > 2) Some driver might want to avoid allocating a new copy of a big table if
> > > > > > > > > >    the race does not occur (e.g. with additional checks on buffer access
> > > > > > > > > >    code) and thus wouldn't call a free() in the release callback.
> > > > > > > > > >   
> > > > > > > > > > > In any case it looks fragile and not scalable. I propose to drop this
> > > > > > > > > > > and think again.    
> > > > > > > > > > 
> > > > > > > > > > I see your concerns, I am open to reconsider this in case we come up with
> > > > > > > > > > better solution after addressing the points above.
> > > > > > > > > >   
> > > > > > > > > > > Yes, yes, I'm fully aware about the problem you are trying to solve and
> > > > > > > > > > > agree on the report, I think this solution is not good enough.
> > > > > > > > > > 
> > > > > > > > > > [1]: https://lore.kernel.org/linux-iio/20240729211100.0d602d6e@jic23-huawei/
> > > > > > > > 
> > > > > > > > I hope I've brought a little more clarity to the discussion by providing some
> > > > > > > > history instead of making it more confusing.
> > > > > > > 
> > > > > > > Sure, the code example in particular is useful.
> > > > > 
> > > > > Just a friendly reminder this has been sitting for a while, any news or
> > > > > additional considerations?
> > > > 
> > > > Moving the allocation control to the drivers will satisfy me as well, however
> > > > it makes even more duplication of the code, but at least it will be cleaner
> > > > design-wise in my opinion.
> > > 
> > > Would it work with the constraints on the info_exists lock mentioned
> > > above?
> > 
> > None of the given examples (fix-N) provides a lock, so I have no clue how it's
> > involved here. May be you can elaborate more?
> 
> I thought that with "Moving the allocation control to the drivers" you
> were referring to the option (not included among fix-N) to move the
> allocation of the consumer copy from the inkern iio_channel_read_avail()
> to the consumer drivers themselves. You elaborated this point above
> where I answered with the concerns about the info_exists lock that
> should be addressed.
> 
> > > > In any case the last word is on Jonathan.

-- 
With Best Regards,
Andy Shevchenko






[Index of Archives]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux