Re: adi-axi-adc issues and how to properly support this designs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jonathan,

Thanks for the feedback. Definitely helpful...

On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote:
> On Fri, 31 Mar 2023 16:40:44 +0200
> Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> 
> > Hi Jonathan,
> > 
> 
> Hmm. Complex topic, so I'd definitely like others to weigh in with
> opinions .
> 
> > There are some major issues with the implementation we have upstream
> > for the adi-axi-adc driver. The only user is the ad9467 driver. In the
> > rest of the email I will refer to ad9467 as the converter device and
> > axi-adc as the IP core. 
> > 
> > Let me start to state that these designs, in a very a basic way, have a
> > converter connected to a IP core that typically lives in a FPGA. This
> > core connects to a DMA core (which is used by the DMA buffer
> > implementation) so we can keep up with the high rates of these
> > converters. So there's a link between these devices and we try to
> > implement that so far looking at them as one single IIO device.
> > 
> > Let's go to the major issues now:
> > 
> > 1) There is a circular dependency between the two device. And when
> > compiled as modules for example, we cannot really rmmod the modules
> > anymore:
> > 
> > "root@analog:~# rmmod ad9467
> > rmmod: ERROR: Module ad9467 is in use
> > 
> > root@analog:~# rmmod adi-axi-adc.ko
> > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
> > "
> > 
> > This easy to understand as the ad9467 driver as a direct symbol
> > dependency on the axi-adc one. And this one calls 'module_get()' as
> > soon as it attaches to a "client"
> 
> Ouch. module_get() never works for this.  Long time ago I thought it
> did :(  An unbind will bypass that (as well any real hot unplug paths).
> 

Yeps, it bypasses it but I just wanted to point out the flaw in the current design :)

> 
> > 
> > 2) It's fairly easy to crash the kernel:
> > 
> > "
> > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
> > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
> > buffer0/in_voltage0_en
> > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
> > [  132.349133] 8<--- cut here ---
> > [  132.352193] Unable to handle kernel paging request at virtual
> > address e0940000 when read
> > [  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
> > [  132.366668] Internal error: Oops: 7 [#1] SMP ARM
> > [  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
> > [  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
> > [  132.382701] Hardware name: Xilinx Zynq Platform
> > [  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
> > [adi_axi_adc]
> > [  132.394020] LR is at arm_heavy_mb+0x1c/0x38
> > [  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
> > [  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
> > [  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
> > [  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
> > 000003f0
> > [  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
> > 00000000
> > "
> 
> It's nasty but generally that sort of path can be prevented with some
> careful locking and checking of indicators that a component has gone away.
> So whenever you unbind one aspect it notifies the other one that it has gone
> away. I'm not sure where to currently look for best practice in this.
> 
> There are a lot of similar situations - basically anywhere a set of 
> drivers end up hanging off two buses.  v4l and media drivers in general
> end up doing this a lot.
> 
> One I'm familiar with is how some of the CXL tear down happens and
> that lead me to device_release_driver() which is also used by usb etc.
> I've not looked at this for a while but it may provide the tear down needed
> if the right dance is used.  I think that will work if your convertor
> driver is using services off the IP core driver and someone unbinds
> that ip-core driver.
> 

Yes, CCF does it with refcounting and some dummy clock ops for the case a clock
provider is gone (with active consumers).

> 
> > 
> > 3) This solution does not really scale. There are designs where we have
> > multiple instances of the IP core attached to the client (typically if
> > a converter like this has more than one channel, we have one instance
> > of the core per channel). The way we have it know, that would not me
> > feasable.
> > 

...

> 
> 
> > However, one thing that
> > I'm not sure it will be ideal with having two devices is synchronizing
> > accesses to the IP core. For example, the axi-dac driver (that I also
> > intend to bring upstream as part of this work) also has an userspace
> > interface so we need to be carefull.
> 
> I'm a bit lost. I think we'd only expect to see a userspace interface
> on the 'front end' driver. Is there something on the driver handling the
> IP core as well (beyond debug which you can do whatever you like with :)
> 

See below... 

> > But I guess this is no different
> > for devices which are accessed by the in-kernel IIO interface.
> > 
> > I'm way more inclined to go with 2).
> 
> I'm not sure I fully follow what option 2) is!  Perhaps a bit
> of ascii art?
> 

So option 2 is pretty much treating the IP core also as a separate IIO device.
This is not so visible in axi-adc but on the dac counterpart, the settings
to generate tones (frequency/phase) are on the core. So, this could map to
IIO ABI.

However, I guess we can provide some callback/ops to read_raw()/write_raw() 
to extend the front end converter (This is what is done today upstream but
in the wrong direction). In this way, I think we could still only expose the
frond end device.

> > 
> > Another thing that I've been thinking is how should we implement the IP
> > core driver... As stated, there are usecases where we have mulitple
> > instances of these devices. Well, that is how we've been doing things
> > in ADI tree but essentially the core is just one big "block/device"
> > [1]. However, the way we've been doing things is actually ok because
> > every sub-block on these IPs have their own memory map allowing to
> > control everything needed by the sub-blocks.
> 
> Device model wise, that sounds like something that would be neater split
> up. So one kernel device registered per sub-block.  That should make
> it easier to handle referencing them from front end drivers.  You can
> also then unbind individual blocks etc rather than having to do them
> in one go.  If there is need for a 'parent' block then this might be
> a fit for the auxiliary bus framework - or something that looks like
> that but suits your particular requirements.
> 

Agreed. Instead of auxiliary bus, I've been thinking in two option:
 * Normal devicetree (FW) + kref based approach (as CCF, in kernel IIO, etc)
 * component API [1]. The component API looks to fit nicely in these designs
and I would not have to care about refcounting. It's also an all or nothing
approach. Either all devices are present or none is which honestly, I think
it makes sense. I also don't dislike the two staged binding approach... One thing
that worries me is that it looks to be very tailored for DRM (there's even some
DRM specific comments in the code :)) which might bring some unexpected, unpleasant
surprises.

> > 
> > Another option and now that we support output buffers and multiple
> > buffers per device would be to support these cores as one device
> > (likely in the addac folder).
> > However, I think that keeping each block
> > as a standalone device might keep things more simple. But I assume it's
> > tempting to get a first user to the multi buffer support :)
> 
> Do we end up with with one front end, with a bunch of IP cores (or sub cores etc)
> behind it?  If so I'd like the front end (where buffers are presented)
> to have a bunch of output buffers.  I'd assume that there are controls
> for some front ends that affect the data flowing into multiple IP
> blocks? (gain etc).  That front end (which would be the IIO driver)
> might well register for multiple backends (IP block driver).
> 

Yes. Think of high speed ADCs/DACs with more than one channel/
data paths. Typically each interface (LVDS/CMOS/JESD) of that path is connected
to the fpga mapping to it's own sub IP block. So yes, you're right. In this
case, as we are moving more for the "front end" approach, we will still have
users for multi buffer support :).

> > 
> > Now for the big elephant of all the above... Everything that I've been
> > saying means breaking the current ABI. I'm fairly confident that we
> > should not have any upstream user for these drivers because it lacks a
> > lot of features that are only present in ADI tree. So, I think it's ok
> > to destroy the ABI but of course I cannot guarantee there are no users
> > :).
> 
> I'm less worried about doing this for this driver than the vast majority
> of others because at least for upstream I don't think we'll have huge
> numbers of users.  + they'll shout at you not me ;)
> 
> > 
> > So, would it be fine for me to not care about the current ABI while
> > working on this? Because, it will be very difficult to come up with
> > something that is compatible with what we have. Well, there's always
> > the option of coming up with a -v2 (or something like that) for these
> > drivers keeping the ones we have now, untouched. Ideas?
> 
> Probably good to layout the current ABI as exposed and then we can see
> whether what we are changing is stuff not used in normal software flows
> anyway or if we are going to break critical stuff.  IF we end up with
> a /sys/bus/iio/device/iio:deviceX that has the same interfaces, then
> the user won't care that the driver structure underneath is totally different.
> I don't care about debug related interfaces remaining stable ABI

Maybe with this front end device logic where we are only still exposing one
IIO device, we we'll end up not breaking that much the ABI (if breaking at all).
> 
> Going for a v2 may be the best option - marking the other one deprecated
> and getting rid of it after a number of years.  If you (ADI in general)
> reckon you can get away with it then I'm happy.  The usecases for this
> stuff tend to be sufficiently high end that I'd imagine you only have to
> deal with a fairly small number of customers and many of them won't
> use upstream directly anyway.
> 
> > 
> > So... feedback is very much appreciated :)
> 
> I think unfortunately this is going to be an area where experimentation
> is needed, particularly as I suspect you have a lot of different devices all
> with subtle differences in requirements. We need soemthing that supports
> them all.
> 
> For extra fun. In the ideal sense any driver for a convertor should work
> equally well with different IP cores (if any ever come along).
> 

Yeah, that is actually my goal. In my head, I'm planning to have a tiny
middleware module where IP core drivers register with some ops() (that we
can always extend) and front end devices get() these cores and then call
the exposed API. Then, ideally, this module should not have any specific
information about the IP core specific internals (naturally some ops/API
might be more tailored to some drivers but we should try to keep them as
generic as possible)... 

[1]: https://elixir.bootlin.com/linux/latest/source/drivers/base/component.c
- Nuno Sá





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Input]     [Linux Kernel]     [Linux SCSI]     [X.org]

  Powered by Linux