On Tue, 18 Apr 2023 15:36:35 +0200 Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > On Sun, 2023-04-16 at 14:33 +0100, Jonathan Cameron wrote: > > On Fri, 14 Apr 2023 11:11:43 +0200 > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > > > > > Hi Jonathan, > > > > > > Thanks for the feedback. Definitely helpful... > > > > > > On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote: > > > > On Fri, 31 Mar 2023 16:40:44 +0200 > > > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > > > > > > > > > Hi Jonathan, > > > > > > > > > > > > > Hmm. Complex topic, so I'd definitely like others to weigh in with > > > > opinions . > > > > > > > > > There are some major issues with the implementation we have upstream > > > > > for the adi-axi-adc driver. The only user is the ad9467 driver. In the > > > > > rest of the email I will refer to ad9467 as the converter device and > > > > > axi-adc as the IP core. > > > > > > > > > > Let me start to state that these designs, in a very a basic way, have a > > > > > converter connected to a IP core that typically lives in a FPGA. This > > > > > core connects to a DMA core (which is used by the DMA buffer > > > > > implementation) so we can keep up with the high rates of these > > > > > converters. So there's a link between these devices and we try to > > > > > implement that so far looking at them as one single IIO device. > > > > > > > > > > Let's go to the major issues now: > > > > > > > > > > 1) There is a circular dependency between the two device. And when > > > > > compiled as modules for example, we cannot really rmmod the modules > > > > > anymore: > > > > > > > > > > "root@analog:~# rmmod ad9467 > > > > > rmmod: ERROR: Module ad9467 is in use > > > > > > > > > > root@analog:~# rmmod adi-axi-adc.ko > > > > > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467 > > > > > " > > > > > > > > > > This easy to understand as the ad9467 driver as a direct symbol > > > > > dependency on the axi-adc one. And this one calls 'module_get()' as > > > > > soon as it attaches to a "client" > > > > > > > > Ouch. module_get() never works for this. Long time ago I thought it > > > > did :( An unbind will bypass that (as well any real hot unplug paths). > > > > > > > > > > Yeps, it bypasses it but I just wanted to point out the flaw in the current > > > design :) > > > > It was a surprise to me when Lars pointed this out years ago. > > All we can do is treat it as a hint for something that makes > > little sense for a user to deliberately do. > > > > > > > > > > > > > > > > > > > 2) It's fairly easy to crash the kernel: > > > > > > > > > > " > > > > > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind > > > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > > > > > > buffer0/in_voltage0_en > > > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable > > > > > [ 132.349133] 8<--- cut here --- > > > > > [ 132.352193] Unable to handle kernel paging request at virtual > > > > > address e0940000 when read > > > > > [ 132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000 > > > > > [ 132.366668] Internal error: Oops: 7 [#1] SMP ARM > > > > > [ 132.371289] Modules linked in: ad9467 adi_axi_adc ad9517 > > > > > [ 132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3 > > > > > [ 132.382701] Hardware name: Xilinx Zynq Platform > > > > > [ 132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88 > > > > > [adi_axi_adc] > > > > > [ 132.394020] LR is at arm_heavy_mb+0x1c/0x38 > > > > > [ 132.398212] pc : [<bf0060c4>] lr : [<c031820c>] psr: a0000013 > > > > > [ 132.404392] sp : e0929e30 ip : deaddead fp : c4430270 > > > > > [ 132.409678] r10: c8a0bc18 r9 : deaddeac r8 : c89b1c00 > > > > > [ 132.414895] r7 : c4430340 r6 : c45db7c0 r5 : e093ffc0 r4 : > > > > > 000003f0 > > > > > [ 132.421413] r3 : 00010000 r2 : e0940000 r1 : 00000000 r0 : > > > > > 00000000 > > > > > " > > > > > > > > It's nasty but generally that sort of path can be prevented with some > > > > careful locking and checking of indicators that a component has gone away. > > > > So whenever you unbind one aspect it notifies the other one that it has > > > > gone > > > > away. I'm not sure where to currently look for best practice in this. > > > > > > > > There are a lot of similar situations - basically anywhere a set of > > > > drivers end up hanging off two buses. v4l and media drivers in general > > > > end up doing this a lot. > > > > > > > > One I'm familiar with is how some of the CXL tear down happens and > > > > that lead me to device_release_driver() which is also used by usb etc. > > > > I've not looked at this for a while but it may provide the tear down > > > > needed > > > > if the right dance is used. I think that will work if your convertor > > > > driver is using services off the IP core driver and someone unbinds > > > > that ip-core driver. > > > > > > > > > > Yes, CCF does it with refcounting and some dummy clock ops for the case a > > > clock > > > provider is gone (with active consumers). > > > > That must be 'exciting' on occasion as no way to know if the clock disappeared > > or reset and hence the device interface might lock up. > > > > > > > > > > > > > > > > > > > 3) This solution does not really scale. There are designs where we have > > > > > multiple instances of the IP core attached to the client (typically if > > > > > a converter like this has more than one channel, we have one instance > > > > > of the core per channel). The way we have it know, that would not me > > > > > feasable. > > > > > > > > > > > ... > > > > > > > > > > > > > > > > However, one thing that > > > > > I'm not sure it will be ideal with having two devices is synchronizing > > > > > accesses to the IP core. For example, the axi-dac driver (that I also > > > > > intend to bring upstream as part of this work) also has an userspace > > > > > interface so we need to be carefull. > > > > > > > > I'm a bit lost. I think we'd only expect to see a userspace interface > > > > on the 'front end' driver. Is there something on the driver handling the > > > > IP core as well (beyond debug which you can do whatever you like with :) > > > > > > > > > > See below... > > > > > > > > But I guess this is no different > > > > > for devices which are accessed by the in-kernel IIO interface. > > > > > > > > > > I'm way more inclined to go with 2). > > > > > > > > I'm not sure I fully follow what option 2) is! Perhaps a bit > > > > of ascii art? > > > > > > > > > > So option 2 is pretty much treating the IP core also as a separate IIO > > > device. > > > This is not so visible in axi-adc but on the dac counterpart, the settings > > > to generate tones (frequency/phase) are on the core. So, this could map to > > > IIO ABI. > > > > Those are of interest to which bit of analog circuitry? The big beyond the > > front end? If so fine to provide that via the IIO interfaces, but the > > front end device should proxy them (using the IIO in kernel consumer > > interfaces > > - which probably needs some work to make them safe to the provider going away. > > From below, seems that a more focused interface for this particular device > > representation may make sense. > > > > Not sure I'm following your question but I'll try... The settings are on the IP > core beyond the front end but I would say they are of interest of the front end > device. For instance, we might generate a tone on the IP core (setting phase and > frequency) but these bits are to be transmitted trough a dac/dds (our front end > device) analog, digital front end. hehe. I avoided opening your email for a few weeks because I thought it might drag me down a rat hole. Then I find there is just a nice clean reply to my question that would have taken me 2 mins to read. That makes complete sense. To my mind only the front end needs to be an IIO device, but if it's convenient to represent the back end as one we can probably make that work as well. Jonathan > > - Nuno Sá > > >