Re: [RFC PATCH 1/3] iio: addac: add new converter framework

Nuno Sá <noname.nuno@xxxxxxxxx> · Mon, 04 Sep 2023 16:14:17 +0200

On Sun, 2023-09-03 at 11:56 +0100, Jonathan Cameron wrote:
> On Thu, 31 Aug 2023 11:32:54 +0200
> Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> 
> > On Wed, 2023-08-30 at 18:02 +0100, Jonathan Cameron wrote:
> > > On Fri, 4 Aug 2023 16:53:39 +0200
> > > Nuno Sa <nuno.sa@xxxxxxxxxx> wrote:
> > >   
> > > > Signed-off-by: Nuno Sa <nuno.sa@xxxxxxxxxx>  
> > > 
> > > Hi Nuno,
> > >   
> > 
> > Hi Jonathan,
> > 
> > Thanks for the initial review...
> > 
> > > 
> > > One general comment is that you could have stripped this back a fair bit
> > > for ease of understanding.  At this stage we don't care about things
> > > like debug or control of test patterns.  Bring those in as extra patches.
> > >   
> > 
> > Agreed... As I mentioned (I think) in the cover, I made the RFC bigger than
> > needed to
> > kind of showcase how we can properly configure the hdl core to support
> > things
> > (interface calibration) that were very hard to do with the current
> > implementation.
> > I'll make sure to add the minimum needed API to accommodate what we have
> > right now.
> > 
> > > I haven't fully gotten my head around the ordering constraints on removal.
> > > Are there other users of the component framework that have similar
> > > problems?
> > >   
> > 
> > My understanding on the component API is that one should do all the tear
> > down in the
> > .unbind() callback. As usual, I can see some drivers not really doing that.
> > 
> > > Also, I don't yet understand how a multiple front end, single backend
> > > setup
> > > would work.  Or indeed single front end, multiple backend...  Maybe we
> > > don't
> > > need those cases, but if we want this to be useful beyond adi-axi we
> > > probably at least want an outline of how they work.
> > >   
> > 
> > Indeed we can have multiple (and we have it out of tree) backends on one
> > frontend.
> > Think on an ADC/DAC with fairly complex data path with more than one
> > channel/interface (CMOS, LVDS, etc). Typically, in those case, each of the
> > interface
> > will be connected to an instance of the hdl core (the backend).
> 
> That might work out for your case, but not the stm32 one where I think we can
> end
> up with interleaved data from two front ends in the same buffer...
> 

Not sure I'm following this one. But wouldn't that be something specific for
each system (through devicetree)? I haven't tried but I think the same backend
could be used in different frontend devices (using the component API). That is
not really a usecase for me but definitely something that could be supported (if
we need to start doing things like keep enable/disable counters and so on) if it
is a usecase for stm32.

> > 
> > > Jonathan
> > >   
> > > > ---
> > > >  drivers/iio/addac/converter.c       | 547 ++++++++++++++++++++++++++++
> > > >  include/linux/iio/addac/converter.h | 485 ++++++++++++++++++++++++
> > > >  2 files changed, 1032 insertions(+)
> > > >  create mode 100644 drivers/iio/addac/converter.c
> > > >  create mode 100644 include/linux/iio/addac/converter.h
> > > > 
> > > > diff --git a/drivers/iio/addac/converter.c
> > > > b/drivers/iio/addac/converter.c
> > > > new file mode 100644
> > > > index 000000000000..31ac704255ad
> > > > --- /dev/null
> > > > +++ b/drivers/iio/addac/converter.c
> > > > @@ -0,0 +1,547 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > +/*
> > > > + * Framework to handle complex IIO aggregate devices
> > > > + *
> > > > + * A note on some of the design expectations with regards to lifetimes
> > > > and
> > > > + * devices bringup/removal.
> > > > + *
> > > > + * The Framework is using, under the wood, the component API which
> > > > makes it to
> > > > + * easy treat a bunch of devices as one aggregate device. This means
> > > > that the
> > > > + * complete thing is only brought to live when all the deviced are
> > > > probed. To do  
> > > 
> > > devices
> > >   
> > > > + * this, two callbacks are used that should in fact completely replace
> > > > .probe()
> > > > + * and .remove(). The formers should only be used the minimum needed.
> > > > Ideally,  
> > > 
> > > I don't follow the sentence in the middle of the line above.
> > >   
> > > > + * only to call the functions to add and remove frontend annd backend
> > > > devices.  
> > > Spell check...
> > >   
> > > > + *
> > > > + * It is advised for frontend and backend drivers to use their
> > > > .remove()  
> > > 
> > > I'd not 'advise' things. I'd say the 'use' them.
> > >   
> > > > + * callbacks (to use devres API during the frontend and backends
> > > > initialization).
> > > > + * See the comment in @converter_frontend_bind().
> > > > + *
> > > > + * It is also assumed that converter objects cannot be accessed once
> > > > one of the
> > > > + * devices of the aggregate device is removed (effectively bringing the
> > > > all the  
> > > 
> > > bringing all the devices down
> > >   
> > > > + * devices down). Based on that assumption, these objects are not
> > > > refcount which  
> > > 
> > > recounted
> > >   
> > > > + * means accessing them will likely fail miserably.  
> > > 
> > > I hope that doesn't mean there will be no protection.  I don't mind if
> > > nothing
> > > works
> > > but breaking the kernel isn't an option.
> > >   
> > 
> > Hmm, well, you'll have a use after free... But one will have to be creative
> > to use
> > one of these objects after releasing the device from the driver (on the
> > unbind path).
> > And here we don't have any interaction with chardevs, etc which might keep
> > references
> > to devices even after unbind.
> > 
> > The only place where I can see someone doing it wrong is from a frontend
> > driver if
> > for some reason (that I cannot think of now) we need to keep references/use
> > 'struct
> > converter' after .frontend_close() is called. In that case and if the
> > backend driver
> > was the one being removed/unbind, the converter object will effectively be
> > freed (as
> > it was allocated with devres) and we are left with a possible use after
> > free. But
> > that would be a very strange usecase to be missed in review (I hope :)).
> > 
> > We can always refcount the converters (not sure if we need to do it for
> > frontend
> > devices). Sure, drivers can still screw up but at least in that case, the
> > framework
> > is not to blame :).
> 
> If the rules are clearly stated (with some reasoning) I don't think we need
> to care about saying what happens if you break them.  People will always shoot
> themselves in the foot, but as long as it is reasonably fiddly to do that's
> fine by me :)
> 
> ...
> 
> > > > +static int converter_frontend_bind(struct device *dev)
> > > > +{
> > > > +       struct converter_frontend *frontend = dev_get_drvdata(dev);
> > > > +       int ret;
> > > > +
> > > > +       ret = component_bind_all(dev, NULL);
> > > > +       if (ret)
> > > > +               return ret;
> > > > +       /*
> > > > +        * We open a new group so that we can control when resources are
> > > > +        * released and still use device managed (devm_) calls. The
> > > > expectations
> > > > +        * are that on probe, backend resources are allocated first
> > > > followed by
> > > > +        * the frontend resources (where registering the IIO device must
> > > > happen)
> > > > +        * Naturally we want the reverse order on the unbind path and
> > > > that would
> > > > +        * not be possible without opening our own devres group.
> > > > +
> > > > +        * Note that the component API also opens it's own devres group
> > > > when
> > > > +        * calling the .bind() callbacks for both the aggregate device
> > > > +        * (our frontend) and each of the components (our backends). On
> > > > the
> > > > +        * unbind path, the aggregate .unbind() function is called
> > > > +        * (@converter_frontend_unbind()) which should be responsible to
> > > > tear
> > > > +        * down all the components (effectively releasing all the
> > > > resources
> > > > +        * allocated on each component devres group) and only then the
> > > > aggregate
> > > > +        * devres group is released. Hence, the order we want to
> > > > maintain for
> > > > +        * releasing resources would not be satisfied because backend
> > > > resources
> > > > +        * would be freed first. With our own group, we can control when
> > > > +        * releasing the resources and we do it before
> > > > @component_unbind_all().
> > > > +        *
> > > > +        * This also relies that internally the component API is
> > > > releasing each
> > > > +        * of the component's devres group. That is likely not to
> > > > change, but
> > > > +        * maybe we should not trust it and also open our own groups for
> > > > backend
> > > > +        * devices?!
> > > > +        *
> > > > +        * Another very important thing to keep in mind is that this is
> > > > only
> > > > +        * valid if frontend and backend driver's are implementing their
> > > > +        * .remove() callback to call @converter_frontend_del() and
> > > > +        * @converter_backend_del(). Calling those functions from
> > > > +        * devm_add_action* and use devm APIs in .frontend_init() and
> > > > +        * .backend_init() is not going to work. Not perfect but still
> > > > better
> > > > +        * than having to tear everything down in .frontend_close() and
> > > > +        * .backend_close()  
> > > 
> > > That last bit is nasty and will be non obvious to driver authors.
> > > 
> > > I wonder if we can come up with some means to make it hard to do.
> > >   
> > 
> > Yeah, I agree. The alternative is to always bring everything down in
> > .frontend_close() and .backend_close(). But that can also be prone to subtle
> > bugs
> > because it's easy to mess up the ordering when not using devres.
> > 
> > So, at this point, I cannot really think on a perfect solution rather than
> > keeping
> > some rules like (assuming we keep the logic we have now):
> > 
> > * Using devres on frontend|backend_init() only when .remove() is provided on
> > the
> > driver.
> > * No mixes of devres and .frontend|backend_close()
> > 
> > But yeah, would be nice if we could come up with something to make it more
> > obvious to
> > driver authors.
> 
> > 
> > We might be able to detect that converter_backend_del() and
> > converter_frontend_del()
> > are under devres while no .frontend|backend_close() is being given. I guess
> > that
> > could be a valid indicator of likely misusage.
> > 
> > Or even better (but I'm not sure it's doable with the current devres API),
> > detecting
> > that converter_backend_del() or converter_frontend_del() are under devres
> > while more
> > resources are also allocated in our specific opened groups. That would
> > always be a
> > problem (I think) because the only way for the _del() functions to be under
> > devres is
> > if someone added them (from .probe) with devm_add_action() which means that
> > tearing
> > down the aggregate will happen after some resources (which were allocated in
> > the
> > _init() function) are already freed (as even with new groups, devres will
> > remove
> > things on the reverse order). And that would defenitely be problematic. And,
> > in fact,
> > is the whole reason why I have the .del() functions on .remove() (so,
> > tearing down
> > the aggregate device is the first thing to happen and resources are freed in
> > the
> > reverse order they were allocated).
> > 
> 

This would actually be very messy and hard to do properly. Concurrency between
the aggregate probing (at the component level) and unbinding (at driver core
level) would be very tricky if doable at all. On top that, we do have
devres_find() but no way to tell if a devres group has resources or not. It
would be easy to add one new API but likely not worth it just for this usecase.

I also thought about an helper macro to wrap every devm_ call but your below
suggestion is way better and transparent to users.

> I couldn't work out how to do anything easily and would need some experiments.
> Maybe some 'hidden' devres callbacks and a state flag somewhere.  If we
> register
> that very late we can perhaps detect that we entered devres cleanup before
> calling
> expected manual cleanup.  I'm thinking have the setup path register a flag
> checking
> callback and the cleanup path set a flag (devres now safe).  Then we can at
> least
> make it scream if we end up doing things in wrong way.
> 

Hmm, that might actually be a good idea and something to try. It likely means
having a mutex (I was happy not to have one for now :) but likely inevitable
somewhere down the road) for the flag but it might work. At least we'll be able
to dump a WARN or something if we suspect something is wrong.

> > Other thought would be some generic helper macros to use in these type of
> > drivers so
> > a .remove() callback is always added to remove the components. 
> I wondered if that could work but it's an ugly macro because needs to deal
> with
> different bus types.
> 
> 

We could have a macro per bus type. I'm not really seeing anything different
than platform, spi and i2c. But even like this, it could easily start to get
ugly because of accepted parameters (and any deviation would again mean a
different version per bus type). Another thing that crossed my mind was wrappers
to module_spi_driver() and friends. Maybe that could work. Anyways, as you said
in one of your replies, this is in kernel interface that we can easily
add/change. So, I will defer this to a later point in time.

- Nuno Sá