Hi, Thanks for the help and fixes. I did some testing with your patches. I saw you already submitted the patches, but for reference here's my results. David's initial fix (changing async->sync): no issues Mark's fix: no issues David's second fix: Issue still occurs. But does not cause any problems and together with Mark's fix it works fine. Best Regards, Casper On 2022-09-01 13:02, David Jander wrote: > On Thu, 1 Sep 2022 08:57:37 +0200 > Vincent Whitchurch <vincent.whitchurch@xxxxxxxx> wrote: > > > On Wed, Aug 31, 2022 at 05:07:42PM +0100, Mark Brown wrote: > > > On Mon, Aug 29, 2022 at 10:56:13AM +0200, David Jander wrote: > > > > Not sure if this is a correct fix, but I'd like to know if your situation > > > > changes this way, if you could try it. > > > > I don't have access to any hardware with a mux unfortunately, so I can't test > > > > it myself. > > > > > > I guess claiming to have a noop mux might work for testing, though I'd > > > be dubious that it was actually doing the mux operations properly? I > > > > I'm seeing these problems with the roadtest test framework which uses > > UML and doesn't need any special hardware. Roadtest puts its emulated > > devices under a spi-mux with gpio-mockup and there are no tests for the > > actual muxing, but other driver tests break since transfers using > > spi-mux don't work properly. > > > > I pushed the latest version with SPI support to the tree below. The > > test_adc084s021 tests a SPI device. Roadtest is placed inside a linux > > tree, but it doesn't need any patches to the kernel: > > > > https://github.com/vwax/linux/commits/roadtest/devel > > > > You can reproduce the problem with: > > > > git remote add vwax https://github.com/vwax/linux.git > > git fetch vwax > > git archive vwax/roadtest/devel tools/testing/roadtest | tar xf - > > make -C tools/testing/roadtest/ -j24 OPTS="-k adc --rt-timeout 10 -v" > > > > The test passes on v5.19 but on current mainline it hangs and times out. > > This is very nice. Thanks. > I followed your instructions, and if I apply the following, all tests are > passed. > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > index 83da8862b8f2..32c01e684af3 100644 > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -1727,8 +1727,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread) > spin_unlock_irqrestore(&ctlr->queue_lock, flags); > > ret = __spi_pump_transfer_message(ctlr, msg, was_busy); > - if (!ret) > - kthread_queue_work(ctlr->kworker, &ctlr->pump_messages); > + kthread_queue_work(ctlr->kworker, &ctlr->pump_messages); > > ctlr->cur_msg = NULL; > ctlr->fallback = false; > > The problem is that if __spi_pump_transfer_message() fails, the ctlr->busy > flag is left true, so __spi_async() is not going to queue new work. The busy > transition is handled right above that piece of code, in > __spi_pump_transfer_message(), and the mechanism is to queue more work to do > it. Apparently work was only queued when the transfer was successful, and I am > not sure why it was like that. Queuing work unconditionally solves the issue > and should not be a problem. > > Curious thing is, that this change alone is sufficient to make all the > roadtest tests pass. But I still think that does not fix the bug reported by > Casper. For that, Mark's patch is also necessary. > > @Casper: can you test Mark's patch with above addition? > > > > think we need to either change spi_mux to duplicate the incoming message > > > (that's probably "cleaner") or teach the core that spi-mux exists and > > > should always use the pump/queue. The below might do the trick but in > > > spite of my suggestion above I've not tested either yet: > > > > > > diff --git a/drivers/spi/spi-mux.c b/drivers/spi/spi-mux.c > > > index f5d32ec4634e..0709e987bd5a 100644 > > > --- a/drivers/spi/spi-mux.c > > > +++ b/drivers/spi/spi-mux.c > > > @@ -161,6 +161,7 @@ static int spi_mux_probe(struct spi_device *spi) > > > ctlr->num_chipselect = mux_control_states(priv->mux); > > > ctlr->bus_num = -1; > > > ctlr->dev.of_node = spi->dev.of_node; > > > + ctlr->must_async = true; > > > > > > ret = devm_spi_register_controller(&spi->dev, ctlr); > > > if (ret) > > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > > > index 1cfed874f7ae..88d48a105d3c 100644 > > > --- a/drivers/spi/spi.c > > > +++ b/drivers/spi/spi.c > > > @@ -4033,7 +4033,7 @@ static int __spi_sync(struct spi_device *spi, struct spi_message *message) > > > * guard against reentrancy from a different context. The io_mutex > > > * will catch those cases. > > > */ > > > - if (READ_ONCE(ctlr->queue_empty)) { > > > + if (READ_ONCE(ctlr->queue_empty) && !ctlr->must_async) { > > > message->actual_length = 0; > > > message->status = -EINPROGRESS; > > > > > > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h > > > index e6c73d5ff1a8..f089ee1ead58 100644 > > > --- a/include/linux/spi/spi.h > > > +++ b/include/linux/spi/spi.h > > > @@ -469,6 +469,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch > > > * SPI_TRANS_FAIL_NO_START. > > > * @queue_empty: signal green light for opportunistically skipping the queue > > > * for spi_sync transfers. > > > + * @must_async: disable all fast paths in the core > > > * > > > * Each SPI controller can communicate with one or more @spi_device > > > * children. These make a small bus, sharing MOSI, MISO and SCK signals > > > @@ -690,6 +691,7 @@ struct spi_controller { > > > > > > /* Flag for enabling opportunistic skipping of the queue in spi_sync */ > > > bool queue_empty; > > > + bool must_async; > > > }; > > > > > > static inline void *spi_controller_get_devdata(struct spi_controller *ctlr) > > > > > > Assuming that works out there'll be an extra test in the fast path but > > > no sync operations, and a performance hit for spi-mux users. Hopefully > > > that works as well as it did before. > >[...] > > Best regards, > > -- > David Jander > Protonic Holland.