On Thu, Sep 01, 2022 at 01:02:22PM +0200, David Jander wrote: > On Thu, 1 Sep 2022 08:57:37 +0200 > Vincent Whitchurch <vincent.whitchurch@xxxxxxxx> wrote: > > You can reproduce the problem with: > > > > git remote add vwax https://github.com/vwax/linux.git > > git fetch vwax > > git archive vwax/roadtest/devel tools/testing/roadtest | tar xf - > > make -C tools/testing/roadtest/ -j24 OPTS="-k adc --rt-timeout 10 -v" > > > > The test passes on v5.19 but on current mainline it hangs and times out. > > This is very nice. Thanks. Great to hear that it was useful! > I followed your instructions, and if I apply the following, all tests are > passed. > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > index 83da8862b8f2..32c01e684af3 100644 > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -1727,8 +1727,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread) > spin_unlock_irqrestore(&ctlr->queue_lock, flags); > > ret = __spi_pump_transfer_message(ctlr, msg, was_busy); > - if (!ret) > - kthread_queue_work(ctlr->kworker, &ctlr->pump_messages); > + kthread_queue_work(ctlr->kworker, &ctlr->pump_messages); > > ctlr->cur_msg = NULL; > ctlr->fallback = false; > > The problem is that if __spi_pump_transfer_message() fails, the ctlr->busy > flag is left true, so __spi_async() is not going to queue new work. The busy > transition is handled right above that piece of code, in > __spi_pump_transfer_message(), and the mechanism is to queue more work to do > it. Apparently work was only queued when the transfer was successful, and I am > not sure why it was like that. Queuing work unconditionally solves the issue > and should not be a problem. > > Curious thing is, that this change alone is sufficient to make all the > roadtest tests pass. But I still think that does not fix the bug reported by > Casper. For that, Mark's patch is also necessary. Yes, I noticed that too. If you comment out the two usages of "@flaky_bus" in tools/testing/roadtest/roadtest/tests/iio/adc/test_adc084s021.py then the bus error injection will be disabled and only the success paths will be tested. If you do that and test ae7d2346dc89ae89a6e0 ("spi: Don't use the message queue if possible in spi_sync") with the crash fix and iio sysfs fixes cherry-picked in, you'll see the following failure without Mark's patch: FAILED roadtest/tests/iio/adc/test_adc084s021.py::test_illuminance - BlockingIOError: [Errno 115] Operation now in progress But if you move forward to 69fa95905d40846756d22 ("spi: Ensure the io_mutex is held until spi_finalize_current_message()"), then the tests start passing again, if they're run without the error injection. So 69fa95905d40846756d22 seems to be masking the original problem from ae7d2346dc89ae89a6e0 (while at the same time introducing the other problem in the error handling).