Re: [RFC PATCH 1/3] iio: addac: add new converter framework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 31 Aug 2023 11:32:54 +0200
Nuno Sá <noname.nuno@xxxxxxxxx> wrote:

> On Wed, 2023-08-30 at 18:02 +0100, Jonathan Cameron wrote:
> > On Fri, 4 Aug 2023 16:53:39 +0200
> > Nuno Sa <nuno.sa@xxxxxxxxxx> wrote:
> >   
> > > Signed-off-by: Nuno Sa <nuno.sa@xxxxxxxxxx>  
> > 
> > Hi Nuno,
> >   
> 
> Hi Jonathan,
> 
> Thanks for the initial review...
> 
> > 
> > One general comment is that you could have stripped this back a fair bit
> > for ease of understanding.  At this stage we don't care about things
> > like debug or control of test patterns.  Bring those in as extra patches.
> >   
> 
> Agreed... As I mentioned (I think) in the cover, I made the RFC bigger than needed to
> kind of showcase how we can properly configure the hdl core to support things
> (interface calibration) that were very hard to do with the current implementation.
> I'll make sure to add the minimum needed API to accommodate what we have right now.
> 
> > I haven't fully gotten my head around the ordering constraints on removal.
> > Are there other users of the component framework that have similar problems?
> >   
> 
> My understanding on the component API is that one should do all the tear down in the
> .unbind() callback. As usual, I can see some drivers not really doing that.
> 
> > Also, I don't yet understand how a multiple front end, single backend setup
> > would work.  Or indeed single front end, multiple backend...  Maybe we don't
> > need those cases, but if we want this to be useful beyond adi-axi we
> > probably at least want an outline of how they work.
> >   
> 
> Indeed we can have multiple (and we have it out of tree) backends on one frontend.
> Think on an ADC/DAC with fairly complex data path with more than one
> channel/interface (CMOS, LVDS, etc). Typically, in those case, each of the interface
> will be connected to an instance of the hdl core (the backend).

That might work out for your case, but not the stm32 one where I think we can end
up with interleaved data from two front ends in the same buffer...

> 
> > Jonathan
> >   
> > > ---
> > >  drivers/iio/addac/converter.c       | 547 ++++++++++++++++++++++++++++
> > >  include/linux/iio/addac/converter.h | 485 ++++++++++++++++++++++++
> > >  2 files changed, 1032 insertions(+)
> > >  create mode 100644 drivers/iio/addac/converter.c
> > >  create mode 100644 include/linux/iio/addac/converter.h
> > > 
> > > diff --git a/drivers/iio/addac/converter.c b/drivers/iio/addac/converter.c
> > > new file mode 100644
> > > index 000000000000..31ac704255ad
> > > --- /dev/null
> > > +++ b/drivers/iio/addac/converter.c
> > > @@ -0,0 +1,547 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Framework to handle complex IIO aggregate devices
> > > + *
> > > + * A note on some of the design expectations with regards to lifetimes and
> > > + * devices bringup/removal.
> > > + *
> > > + * The Framework is using, under the wood, the component API which makes it to
> > > + * easy treat a bunch of devices as one aggregate device. This means that the
> > > + * complete thing is only brought to live when all the deviced are probed. To do  
> > 
> > devices
> >   
> > > + * this, two callbacks are used that should in fact completely replace .probe()
> > > + * and .remove(). The formers should only be used the minimum needed. Ideally,  
> > 
> > I don't follow the sentence in the middle of the line above.
> >   
> > > + * only to call the functions to add and remove frontend annd backend devices.  
> > Spell check...
> >   
> > > + *
> > > + * It is advised for frontend and backend drivers to use their .remove()  
> > 
> > I'd not 'advise' things. I'd say the 'use' them.
> >   
> > > + * callbacks (to use devres API during the frontend and backends
> > > initialization).
> > > + * See the comment in @converter_frontend_bind().
> > > + *
> > > + * It is also assumed that converter objects cannot be accessed once one of the
> > > + * devices of the aggregate device is removed (effectively bringing the all the  
> > 
> > bringing all the devices down
> >   
> > > + * devices down). Based on that assumption, these objects are not refcount which  
> > 
> > recounted
> >   
> > > + * means accessing them will likely fail miserably.  
> > 
> > I hope that doesn't mean there will be no protection.  I don't mind if nothing
> > works
> > but breaking the kernel isn't an option.
> >   
> 
> Hmm, well, you'll have a use after free... But one will have to be creative to use
> one of these objects after releasing the device from the driver (on the unbind path).
> And here we don't have any interaction with chardevs, etc which might keep references
> to devices even after unbind.
> 
> The only place where I can see someone doing it wrong is from a frontend driver if
> for some reason (that I cannot think of now) we need to keep references/use 'struct
> converter' after .frontend_close() is called. In that case and if the backend driver
> was the one being removed/unbind, the converter object will effectively be freed (as
> it was allocated with devres) and we are left with a possible use after free. But
> that would be a very strange usecase to be missed in review (I hope :)).
> 
> We can always refcount the converters (not sure if we need to do it for frontend
> devices). Sure, drivers can still screw up but at least in that case, the framework
> is not to blame :).

If the rules are clearly stated (with some reasoning) I don't think we need
to care about saying what happens if you break them.  People will always shoot
themselves in the foot, but as long as it is reasonably fiddly to do that's
fine by me :)

...

> > > +static int converter_frontend_bind(struct device *dev)
> > > +{
> > > +       struct converter_frontend *frontend = dev_get_drvdata(dev);
> > > +       int ret;
> > > +
> > > +       ret = component_bind_all(dev, NULL);
> > > +       if (ret)
> > > +               return ret;
> > > +       /*
> > > +        * We open a new group so that we can control when resources are
> > > +        * released and still use device managed (devm_) calls. The expectations
> > > +        * are that on probe, backend resources are allocated first followed by
> > > +        * the frontend resources (where registering the IIO device must happen)
> > > +        * Naturally we want the reverse order on the unbind path and that would
> > > +        * not be possible without opening our own devres group.
> > > +
> > > +        * Note that the component API also opens it's own devres group when
> > > +        * calling the .bind() callbacks for both the aggregate device
> > > +        * (our frontend) and each of the components (our backends). On the
> > > +        * unbind path, the aggregate .unbind() function is called
> > > +        * (@converter_frontend_unbind()) which should be responsible to tear
> > > +        * down all the components (effectively releasing all the resources
> > > +        * allocated on each component devres group) and only then the aggregate
> > > +        * devres group is released. Hence, the order we want to maintain for
> > > +        * releasing resources would not be satisfied because backend resources
> > > +        * would be freed first. With our own group, we can control when
> > > +        * releasing the resources and we do it before @component_unbind_all().
> > > +        *
> > > +        * This also relies that internally the component API is releasing each
> > > +        * of the component's devres group. That is likely not to change, but
> > > +        * maybe we should not trust it and also open our own groups for backend
> > > +        * devices?!
> > > +        *
> > > +        * Another very important thing to keep in mind is that this is only
> > > +        * valid if frontend and backend driver's are implementing their
> > > +        * .remove() callback to call @converter_frontend_del() and
> > > +        * @converter_backend_del(). Calling those functions from
> > > +        * devm_add_action* and use devm APIs in .frontend_init() and
> > > +        * .backend_init() is not going to work. Not perfect but still better
> > > +        * than having to tear everything down in .frontend_close() and
> > > +        * .backend_close()  
> > 
> > That last bit is nasty and will be non obvious to driver authors.
> > 
> > I wonder if we can come up with some means to make it hard to do.
> >   
> 
> Yeah, I agree. The alternative is to always bring everything down in
> .frontend_close() and .backend_close(). But that can also be prone to subtle bugs
> because it's easy to mess up the ordering when not using devres.
> 
> So, at this point, I cannot really think on a perfect solution rather than keeping
> some rules like (assuming we keep the logic we have now):
> 
> * Using devres on frontend|backend_init() only when .remove() is provided on the
> driver.
> * No mixes of devres and .frontend|backend_close()
> 
> But yeah, would be nice if we could come up with something to make it more obvious to
> driver authors.

> 
> We might be able to detect that converter_backend_del() and converter_frontend_del()
> are under devres while no .frontend|backend_close() is being given. I guess that
> could be a valid indicator of likely misusage.
> 
> Or even better (but I'm not sure it's doable with the current devres API), detecting
> that converter_backend_del() or converter_frontend_del() are under devres while more
> resources are also allocated in our specific opened groups. That would always be a
> problem (I think) because the only way for the _del() functions to be under devres is
> if someone added them (from .probe) with devm_add_action() which means that tearing
> down the aggregate will happen after some resources (which were allocated in the
> _init() function) are already freed (as even with new groups, devres will remove
> things on the reverse order). And that would defenitely be problematic. And, in fact,
> is the whole reason why I have the .del() functions on .remove() (so, tearing down
> the aggregate device is the first thing to happen and resources are freed in the
> reverse order they were allocated).
> 

I couldn't work out how to do anything easily and would need some experiments.
Maybe some 'hidden' devres callbacks and a state flag somewhere.  If we register
that very late we can perhaps detect that we entered devres cleanup before calling
expected manual cleanup.  I'm thinking have the setup path register a flag checking
callback and the cleanup path set a flag (devres now safe).  Then we can at least
make it scream if we end up doing things in wrong way.

> Other thought would be some generic helper macros to use in these type of drivers so
> a .remove() callback is always added to remove the components. 
I wondered if that could work but it's an ugly macro because needs to deal with
different bus types.

> 
> Anyways, even the above might be tricky enough to not include it. I'll give it some
> thought.

Agreed - seems fiddly but maybe there is a neat trick to it.
Or indeed another subsystem already doing something.

> > > +#define converter_test_pattern_xlate(pattern, xlate)   \
> > > +       __converter_test_pattern_xlate(pattern, xlate, ARRAY_SIZE(xlate));
> > > +
> > > +#if IS_ENABLED(CONFIG_IIO_CONVERTER)  
> > 
> > Why?  I'd expect any driver that uses this framework to be useless without
> > it, so we shouldn't need protections. Handle that with Kconfig select / depends
> >   
> 
> Well, we do have cases of frontends that might be used in standalone mode (I mean,
> with no backend device) or with the backend connected. But alright, I will keep
> things simple for now and let's take care if such case ever get's upstream (hopefully
> they eventual do :)).

Yeah. Introduce the stubs when you need them ;)

...

> > > +
> > > +/**
> > > + * converter_frontend_del - Remove the frontend device
> > > + * @dev:       Device to remove from the aggregate
> > > + *
> > > + * Removes the frontend from the aggregate device. This tears down the frontend
> > > + * and all the converters.
> > > + *
> > > + * Ideally, this should be called from the frontend driver .remove() callback.
> > > + * This means that all the converters (and the frontend) will be tear down  
> > torn down   
> > > + * before running any specific devres cleanup (at the driver core level). What
> > > + * this all means is that we can use devm_ apis in .frontend_init() and being
> > > + * sure those resources will be released after the backend resources and before
> > > + * any devm_* used in .probe(). If that is not the case, one should likely not
> > > + * use any devm_ API in .frontend_init(). That means .frontend_close() should be
> > > + * provided to do all the necessary cleanups.  
> > 
> > You can force a driver remove to tear down another driver binding first though it
> > all
> > gets fiddly.  Take a look at how device_release_driver() is used.  May well not
> > help you here though - I've not thought it through properly.
> >   
> 
> Yeah, I know but I don't think it would help us here or even be correct and I think
> the problem would be the same. I mean, we would still need to call
> converter_frontend_del() or converter_backend_del() from the respective .remove()
> callbacks.
> 
Agreed. I'm also not sure it helps and low on time at the moment to experiment around this.

Jonathan




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Input]     [Linux Kernel]     [Linux SCSI]     [X.org]

  Powered by Linux