Re: adi-axi-adc issues and how to properly support this designs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2023-05-01 at 17:12 +0100, Jonathan Cameron wrote:
> On Tue, 18 Apr 2023 15:36:35 +0200
> Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> 
> > On Sun, 2023-04-16 at 14:33 +0100, Jonathan Cameron wrote:
> > > On Fri, 14 Apr 2023 11:11:43 +0200
> > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> > >   
> > > > Hi Jonathan,
> > > > 
> > > > Thanks for the feedback. Definitely helpful...
> > > > 
> > > > On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote:  
> > > > > On Fri, 31 Mar 2023 16:40:44 +0200
> > > > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> > > > >     
> > > > > > Hi Jonathan,
> > > > > >     
> > > > > 
> > > > > Hmm. Complex topic, so I'd definitely like others to weigh in with
> > > > > opinions .
> > > > >     
> > > > > > There are some major issues with the implementation we have upstream
> > > > > > for the adi-axi-adc driver. The only user is the ad9467 driver. In the
> > > > > > rest of the email I will refer to ad9467 as the converter device and
> > > > > > axi-adc as the IP core. 
> > > > > > 
> > > > > > Let me start to state that these designs, in a very a basic way, have a
> > > > > > converter connected to a IP core that typically lives in a FPGA. This
> > > > > > core connects to a DMA core (which is used by the DMA buffer
> > > > > > implementation) so we can keep up with the high rates of these
> > > > > > converters. So there's a link between these devices and we try to
> > > > > > implement that so far looking at them as one single IIO device.
> > > > > > 
> > > > > > Let's go to the major issues now:
> > > > > > 
> > > > > > 1) There is a circular dependency between the two device. And when
> > > > > > compiled as modules for example, we cannot really rmmod the modules
> > > > > > anymore:
> > > > > > 
> > > > > > "root@analog:~# rmmod ad9467
> > > > > > rmmod: ERROR: Module ad9467 is in use
> > > > > > 
> > > > > > root@analog:~# rmmod adi-axi-adc.ko
> > > > > > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
> > > > > > "
> > > > > > 
> > > > > > This easy to understand as the ad9467 driver as a direct symbol
> > > > > > dependency on the axi-adc one. And this one calls 'module_get()' as
> > > > > > soon as it attaches to a "client"    
> > > > > 
> > > > > Ouch. module_get() never works for this.  Long time ago I thought it
> > > > > did :(  An unbind will bypass that (as well any real hot unplug paths).
> > > > >     
> > > > 
> > > > Yeps, it bypasses it but I just wanted to point out the flaw in the current
> > > > design :)  
> > > 
> > > It was a surprise to me when Lars pointed this out years ago.
> > > All we can do is treat it as a hint for something that makes
> > > little sense for a user to deliberately do.
> > >   
> > > >   
> > > > >     
> > > > > > 
> > > > > > 2) It's fairly easy to crash the kernel:
> > > > > > 
> > > > > > "
> > > > > > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
> > > > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
> > > > > > buffer0/in_voltage0_en
> > > > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
> > > > > > [  132.349133] 8<--- cut here ---
> > > > > > [  132.352193] Unable to handle kernel paging request at virtual
> > > > > > address e0940000 when read
> > > > > > [  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
> > > > > > [  132.366668] Internal error: Oops: 7 [#1] SMP ARM
> > > > > > [  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
> > > > > > [  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
> > > > > > [  132.382701] Hardware name: Xilinx Zynq Platform
> > > > > > [  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
> > > > > > [adi_axi_adc]
> > > > > > [  132.394020] LR is at arm_heavy_mb+0x1c/0x38
> > > > > > [  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
> > > > > > [  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
> > > > > > [  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
> > > > > > [  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
> > > > > > 000003f0
> > > > > > [  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
> > > > > > 00000000
> > > > > > "    
> > > > > 
> > > > > It's nasty but generally that sort of path can be prevented with some
> > > > > careful locking and checking of indicators that a component has gone away.
> > > > > So whenever you unbind one aspect it notifies the other one that it has
> > > > > gone
> > > > > away. I'm not sure where to currently look for best practice in this.
> > > > > 
> > > > > There are a lot of similar situations - basically anywhere a set of 
> > > > > drivers end up hanging off two buses.  v4l and media drivers in general
> > > > > end up doing this a lot.
> > > > > 
> > > > > One I'm familiar with is how some of the CXL tear down happens and
> > > > > that lead me to device_release_driver() which is also used by usb etc.
> > > > > I've not looked at this for a while but it may provide the tear down
> > > > > needed
> > > > > if the right dance is used.  I think that will work if your convertor
> > > > > driver is using services off the IP core driver and someone unbinds
> > > > > that ip-core driver.
> > > > >     
> > > > 
> > > > Yes, CCF does it with refcounting and some dummy clock ops for the case a
> > > > clock
> > > > provider is gone (with active consumers).  
> > > 
> > > That must be 'exciting' on occasion as no way to know if the clock disappeared
> > > or reset and hence the device interface might lock up. 
> > >   
> > > >   
> > > > >     
> > > > > > 
> > > > > > 3) This solution does not really scale. There are designs where we have
> > > > > > multiple instances of the IP core attached to the client (typically if
> > > > > > a converter like this has more than one channel, we have one instance
> > > > > > of the core per channel). The way we have it know, that would not me
> > > > > > feasable.
> > > > > >     
> > > > 
> > > > ...
> > > >   
> > > > > 
> > > > >     
> > > > > > However, one thing that
> > > > > > I'm not sure it will be ideal with having two devices is synchronizing
> > > > > > accesses to the IP core. For example, the axi-dac driver (that I also
> > > > > > intend to bring upstream as part of this work) also has an userspace
> > > > > > interface so we need to be carefull.    
> > > > > 
> > > > > I'm a bit lost. I think we'd only expect to see a userspace interface
> > > > > on the 'front end' driver. Is there something on the driver handling the
> > > > > IP core as well (beyond debug which you can do whatever you like with :)
> > > > >     
> > > > 
> > > > See below... 
> > > >   
> > > > > > But I guess this is no different
> > > > > > for devices which are accessed by the in-kernel IIO interface.
> > > > > > 
> > > > > > I'm way more inclined to go with 2).    
> > > > > 
> > > > > I'm not sure I fully follow what option 2) is!  Perhaps a bit
> > > > > of ascii art?
> > > > >     
> > > > 
> > > > So option 2 is pretty much treating the IP core also as a separate IIO
> > > > device.
> > > > This is not so visible in axi-adc but on the dac counterpart, the settings
> > > > to generate tones (frequency/phase) are on the core. So, this could map to
> > > > IIO ABI.  
> > > 
> > > Those are of interest to which bit of analog circuitry?  The big beyond the
> > > front end?  If so fine to provide that via the IIO interfaces, but the 
> > > front end device should proxy them (using the IIO in kernel consumer
> > > interfaces
> > > - which probably needs some work to make them safe to the provider going away.
> > > From below, seems that a more focused interface for this particular device
> > > representation may make sense.
> > >   
> > 
> > Not sure I'm following your question but I'll try... The settings are on the IP
> > core beyond the front end but I would say they are of interest of the front end
> > device. For instance, we might generate a tone on the IP core (setting phase and
> > frequency) but these bits are to be transmitted trough a dac/dds (our front end
> > device) analog, digital front end.
> 
> hehe. I avoided opening your email for a few weeks because I thought it might
> drag me down a rat hole.  Then I find there is just a nice clean reply to
> my question that would have taken me 2 mins to read.
> 

Ehehe. nice surprise then :)

> That makes complete sense.  To my mind only the front end needs to be an IIO
> device, but if it's convenient to represent the back end as one we can probably
> make that work as well.
> > 

Alright, I'll go then the one IIO device route for v1 (at least start working with
that goal :)) so we can see how it looks like...

Thanks for your feedback on this!

- Nuno Sá




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Input]     [Linux Kernel]     [Linux SCSI]     [X.org]

  Powered by Linux