Hi, On Tue, Dec 15, 2020 at 5:18 PM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > Quoting Doug Anderson (2020-12-15 15:34:59) > > On Tue, Dec 15, 2020 at 2:25 PM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > > > > > Quoting Doug Anderson (2020-12-15 09:25:51) > > > > In general when we're starting a new transfer we assume that we can > > > > program the hardware willy-nilly. If there's some chance something > > > > else is happening (or our interrupt could go off) then it breaks that > > > > whole model. > > > > > > Right. I thought this patch was making sure that the hardware wasn't in > > > the process of doing something else when we setup the transfer. I'm > > > saying that only checking the irq misses the fact that maybe the > > > transfer hasn't completed yet or a pending irq hasn't come in yet, but > > > the fifo status would tell us that the fifo is transferring something or > > > receiving something. If an RX can't happen, then the code should clearly > > > show that an RX irq isn't expected, and mask out that bit so it is > > > ignored or explicitly check for it and call WARN_ON() if the bit is set. > > > > > > I'm wondering why we don't check the FIFO status and the irq bits to > > > make sure that some previous cancelled operation isn't still pending > > > either in the FIFO or as an irq. While this patch will fix the scenario > > > where the irq is delayed but pending in the hardware it won't cover the > > > case that the hardware itself is wedged, for example because the > > > sequencer just decided to stop working entirely. > > > > It also won't catch the case where the SoC decided that all GPIOs are > > inverted and starts reporting highs for lows and lows for highs, nor > > does it handle the case where the CPU suddenly switches to Big Endian > > mode for no reason. :-P > > > > ...by that, I mean I'm not trying to catch the case where the hardware > > itself is behaving in a totally unexpected way. I have seen no > > instances where the hardware wedges nor where the sequencer stops > > working and until I see them happen I'm not inclined to add code for > > them. Without seeing them actually happen I'm not really sure what > > the right way to recover would be. We've already tried "cancel" and > > "abort" and then waited at least 1 second. If you know of some sort > > of magic "unwedge" then we should add it into handle_fifo_timeout(). > > I am not aware of an "unwedge" command. Presumably the cancel/abort > stuff makes the FIFO state "sane" so there's nothing to see in the FIFO > status registers. I wonder if we should keep around some "did we cancel > last time?" flag and only check the isr if we canceled out and timed > out to boot? That would be a cheap and easy check to make sure that we > don't check this each transaction. Sure. I guess technically it would be a "did we fail to cancel last time". > > However, super delayed interrupts due to software not servicing the > > interrupt in time is something that really happens, if rarely. Adding > > code to account for that seems worth it and is easy to test... > > > > Agreed. The function name is wrong then as the device is not "busy". So > maybe spi_geni_isr_pending()? That would clearly describe what's being > checked. I changed this to just be about the abort. See if v2 looks better to you.