Quoting Doug Anderson (2020-12-10 17:04:06) > On Thu, Dec 10, 2020 at 4:51 PM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > > > I'm worried about the buffer disappearing if spi core calls handle_err() > > but the geni_spi_isr() handler runs both an rx and a cancel/abort > > routine. That doesn't seem to be the case though so it looks all fine. > > It would be pretty racy if that was the case. Until it calls > handle_timeout() we're still free to write to that buffer, right? Right I don't see any badness here. > > > > > If we want to try to do better, we can do timeout handling ourselves. > > > The SPI core allows for that. > > > > > > > > > > So why don't we check for cur_xfer being NULL in the rx/tx handling > > > > paths too and bail out there? Does the FIFO need to be cleared out in > > > > such a situation that spi core thinks a timeout happened but there's RX > > > > data according to m_irq? Do we need to read it all and throw it away? Or > > > > does the abort/cancel clear out the RX fifo? > > > > > > I don't know for sure, but IMO it's safest to read anything that's in > > > the FIFO. It's also important to adjust the watermark in the TX case. > > > The suggestions I provided in my original reply (#2 and #3) handle > > > this and are plenty simple. > > > > > > As per my original reply, though, anything we do in the ISR doesn't > > > replace the changes we need to make to handle_fifo_timeout(). It is > > > very important that when handle_fifo_timeout() finishes that no future > > > interrupts for old transfers will fire. > > > > > > > Alright. With a proper diagram in the commit text I think doing all of > > the points, 1 through 3, would be good and required to leave the > > hardware in a sane state for all the scenarios. Why do we need to call > > synchronize_irq() at the start and end of handle_fifo_timeout() though? > > Presumably having it at the start would make sure the long delayed irq > > runs and handles any rx/tx by throwing it away. Sounds great, but having > > it at the end leaves me confused. We want to make sure the cancel really > > went through? Don't we know that because the completion variable for > > cancel succeeded? > > I want it to handle the case where the "abort" command timed out! :-) > If the "abort" command timed out then it's still pending and we could > get an interrupt for it at some future point in time. Sure but who cares? We set a completion variable if abort comes in later. We'll put a message in the log indicating that it "Failed" but otherwise handle_fifo_timeout() can't return an error so we have to give up eventually. > > > > I guess I'm not convinced that the hardware is so bad that it cancels > > and aborts the sequencer, raises an irq for that, and then raises an irq > > for the earlier rx/tx that the sequencer canceled out. Is that > > happening? It's called a sequencer because presumably it runs a sequence > > of operations like tx, rx, cs change, cancel, and abort. Hopefully it > > doesn't run them out of order. If they run at the same time it's fine, > > the irq handler will see all of them and throw away reads, etc. > > Maybe answered by me explaining that I'm worried about the case where > "abort" times out (and thus the "done" from the abort is still > pending). > > NOTE: I will also assert that if we send the "abort" then it seems > like it has a high likelihood of timing out. Why do I say that? In > order to even get to sending the "abort", it means: > > a) The original transfer timed out > > b) The "cancel" timed out. As you can see, if the "cancel" doesn't > time out we don't even send the "abort" > > ...so two things timed out, one of which we _just_ sent. The "abort" > feels like a last ditch effort. Might as well try it, but things were > in pretty sorry shape to start with by the time we tried it. > Yeah and so if it comes way later because it timed out then what's the point of calling synchronize_irq() again? To make the completion variable set when it won't be tested again until it is reinitialized?