Hi Jonathan, Thanks for the feedback. Definitely helpful... On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote: > On Fri, 31 Mar 2023 16:40:44 +0200 > Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > > > Hi Jonathan, > > > > Hmm. Complex topic, so I'd definitely like others to weigh in with > opinions . > > > There are some major issues with the implementation we have upstream > > for the adi-axi-adc driver. The only user is the ad9467 driver. In the > > rest of the email I will refer to ad9467 as the converter device and > > axi-adc as the IP core. > > > > Let me start to state that these designs, in a very a basic way, have a > > converter connected to a IP core that typically lives in a FPGA. This > > core connects to a DMA core (which is used by the DMA buffer > > implementation) so we can keep up with the high rates of these > > converters. So there's a link between these devices and we try to > > implement that so far looking at them as one single IIO device. > > > > Let's go to the major issues now: > > > > 1) There is a circular dependency between the two device. And when > > compiled as modules for example, we cannot really rmmod the modules > > anymore: > > > > "root@analog:~# rmmod ad9467 > > rmmod: ERROR: Module ad9467 is in use > > > > root@analog:~# rmmod adi-axi-adc.ko > > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467 > > " > > > > This easy to understand as the ad9467 driver as a direct symbol > > dependency on the axi-adc one. And this one calls 'module_get()' as > > soon as it attaches to a "client" > > Ouch. module_get() never works for this. Long time ago I thought it > did :( An unbind will bypass that (as well any real hot unplug paths). > Yeps, it bypasses it but I just wanted to point out the flaw in the current design :) > > > > > 2) It's fairly easy to crash the kernel: > > > > " > > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > > > buffer0/in_voltage0_en > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable > > [ 132.349133] 8<--- cut here --- > > [ 132.352193] Unable to handle kernel paging request at virtual > > address e0940000 when read > > [ 132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000 > > [ 132.366668] Internal error: Oops: 7 [#1] SMP ARM > > [ 132.371289] Modules linked in: ad9467 adi_axi_adc ad9517 > > [ 132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3 > > [ 132.382701] Hardware name: Xilinx Zynq Platform > > [ 132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88 > > [adi_axi_adc] > > [ 132.394020] LR is at arm_heavy_mb+0x1c/0x38 > > [ 132.398212] pc : [<bf0060c4>] lr : [<c031820c>] psr: a0000013 > > [ 132.404392] sp : e0929e30 ip : deaddead fp : c4430270 > > [ 132.409678] r10: c8a0bc18 r9 : deaddeac r8 : c89b1c00 > > [ 132.414895] r7 : c4430340 r6 : c45db7c0 r5 : e093ffc0 r4 : > > 000003f0 > > [ 132.421413] r3 : 00010000 r2 : e0940000 r1 : 00000000 r0 : > > 00000000 > > " > > It's nasty but generally that sort of path can be prevented with some > careful locking and checking of indicators that a component has gone away. > So whenever you unbind one aspect it notifies the other one that it has gone > away. I'm not sure where to currently look for best practice in this. > > There are a lot of similar situations - basically anywhere a set of > drivers end up hanging off two buses. v4l and media drivers in general > end up doing this a lot. > > One I'm familiar with is how some of the CXL tear down happens and > that lead me to device_release_driver() which is also used by usb etc. > I've not looked at this for a while but it may provide the tear down needed > if the right dance is used. I think that will work if your convertor > driver is using services off the IP core driver and someone unbinds > that ip-core driver. > Yes, CCF does it with refcounting and some dummy clock ops for the case a clock provider is gone (with active consumers). > > > > > 3) This solution does not really scale. There are designs where we have > > multiple instances of the IP core attached to the client (typically if > > a converter like this has more than one channel, we have one instance > > of the core per channel). The way we have it know, that would not me > > feasable. > > ... > > > > However, one thing that > > I'm not sure it will be ideal with having two devices is synchronizing > > accesses to the IP core. For example, the axi-dac driver (that I also > > intend to bring upstream as part of this work) also has an userspace > > interface so we need to be carefull. > > I'm a bit lost. I think we'd only expect to see a userspace interface > on the 'front end' driver. Is there something on the driver handling the > IP core as well (beyond debug which you can do whatever you like with :) > See below... > > But I guess this is no different > > for devices which are accessed by the in-kernel IIO interface. > > > > I'm way more inclined to go with 2). > > I'm not sure I fully follow what option 2) is! Perhaps a bit > of ascii art? > So option 2 is pretty much treating the IP core also as a separate IIO device. This is not so visible in axi-adc but on the dac counterpart, the settings to generate tones (frequency/phase) are on the core. So, this could map to IIO ABI. However, I guess we can provide some callback/ops to read_raw()/write_raw() to extend the front end converter (This is what is done today upstream but in the wrong direction). In this way, I think we could still only expose the frond end device. > > > > Another thing that I've been thinking is how should we implement the IP > > core driver... As stated, there are usecases where we have mulitple > > instances of these devices. Well, that is how we've been doing things > > in ADI tree but essentially the core is just one big "block/device" > > [1]. However, the way we've been doing things is actually ok because > > every sub-block on these IPs have their own memory map allowing to > > control everything needed by the sub-blocks. > > Device model wise, that sounds like something that would be neater split > up. So one kernel device registered per sub-block. That should make > it easier to handle referencing them from front end drivers. You can > also then unbind individual blocks etc rather than having to do them > in one go. If there is need for a 'parent' block then this might be > a fit for the auxiliary bus framework - or something that looks like > that but suits your particular requirements. > Agreed. Instead of auxiliary bus, I've been thinking in two option: * Normal devicetree (FW) + kref based approach (as CCF, in kernel IIO, etc) * component API [1]. The component API looks to fit nicely in these designs and I would not have to care about refcounting. It's also an all or nothing approach. Either all devices are present or none is which honestly, I think it makes sense. I also don't dislike the two staged binding approach... One thing that worries me is that it looks to be very tailored for DRM (there's even some DRM specific comments in the code :)) which might bring some unexpected, unpleasant surprises. > > > > Another option and now that we support output buffers and multiple > > buffers per device would be to support these cores as one device > > (likely in the addac folder). > > However, I think that keeping each block > > as a standalone device might keep things more simple. But I assume it's > > tempting to get a first user to the multi buffer support :) > > Do we end up with with one front end, with a bunch of IP cores (or sub cores etc) > behind it? If so I'd like the front end (where buffers are presented) > to have a bunch of output buffers. I'd assume that there are controls > for some front ends that affect the data flowing into multiple IP > blocks? (gain etc). That front end (which would be the IIO driver) > might well register for multiple backends (IP block driver). > Yes. Think of high speed ADCs/DACs with more than one channel/ data paths. Typically each interface (LVDS/CMOS/JESD) of that path is connected to the fpga mapping to it's own sub IP block. So yes, you're right. In this case, as we are moving more for the "front end" approach, we will still have users for multi buffer support :). > > > > Now for the big elephant of all the above... Everything that I've been > > saying means breaking the current ABI. I'm fairly confident that we > > should not have any upstream user for these drivers because it lacks a > > lot of features that are only present in ADI tree. So, I think it's ok > > to destroy the ABI but of course I cannot guarantee there are no users > > :). > > I'm less worried about doing this for this driver than the vast majority > of others because at least for upstream I don't think we'll have huge > numbers of users. + they'll shout at you not me ;) > > > > > So, would it be fine for me to not care about the current ABI while > > working on this? Because, it will be very difficult to come up with > > something that is compatible with what we have. Well, there's always > > the option of coming up with a -v2 (or something like that) for these > > drivers keeping the ones we have now, untouched. Ideas? > > Probably good to layout the current ABI as exposed and then we can see > whether what we are changing is stuff not used in normal software flows > anyway or if we are going to break critical stuff. IF we end up with > a /sys/bus/iio/device/iio:deviceX that has the same interfaces, then > the user won't care that the driver structure underneath is totally different. > I don't care about debug related interfaces remaining stable ABI Maybe with this front end device logic where we are only still exposing one IIO device, we we'll end up not breaking that much the ABI (if breaking at all). > > Going for a v2 may be the best option - marking the other one deprecated > and getting rid of it after a number of years. If you (ADI in general) > reckon you can get away with it then I'm happy. The usecases for this > stuff tend to be sufficiently high end that I'd imagine you only have to > deal with a fairly small number of customers and many of them won't > use upstream directly anyway. > > > > > So... feedback is very much appreciated :) > > I think unfortunately this is going to be an area where experimentation > is needed, particularly as I suspect you have a lot of different devices all > with subtle differences in requirements. We need soemthing that supports > them all. > > For extra fun. In the ideal sense any driver for a convertor should work > equally well with different IP cores (if any ever come along). > Yeah, that is actually my goal. In my head, I'm planning to have a tiny middleware module where IP core drivers register with some ops() (that we can always extend) and front end devices get() these cores and then call the exposed API. Then, ideally, this module should not have any specific information about the IP core specific internals (naturally some ops/API might be more tailored to some drivers but we should try to keep them as generic as possible)... [1]: https://elixir.bootlin.com/linux/latest/source/drivers/base/component.c - Nuno Sá