On Thu, 1 Sep 2022 08:57:37 +0200 Vincent Whitchurch <vincent.whitchurch@xxxxxxxx> wrote: > On Wed, Aug 31, 2022 at 05:07:42PM +0100, Mark Brown wrote: > > On Mon, Aug 29, 2022 at 10:56:13AM +0200, David Jander wrote: > > > Not sure if this is a correct fix, but I'd like to know if your situation > > > changes this way, if you could try it. > > > I don't have access to any hardware with a mux unfortunately, so I can't test > > > it myself. > > > > I guess claiming to have a noop mux might work for testing, though I'd > > be dubious that it was actually doing the mux operations properly? I > > I'm seeing these problems with the roadtest test framework which uses > UML and doesn't need any special hardware. Roadtest puts its emulated > devices under a spi-mux with gpio-mockup and there are no tests for the > actual muxing, but other driver tests break since transfers using > spi-mux don't work properly. > > I pushed the latest version with SPI support to the tree below. The > test_adc084s021 tests a SPI device. Roadtest is placed inside a linux > tree, but it doesn't need any patches to the kernel: > > https://github.com/vwax/linux/commits/roadtest/devel > > You can reproduce the problem with: > > git remote add vwax https://github.com/vwax/linux.git > git fetch vwax > git archive vwax/roadtest/devel tools/testing/roadtest | tar xf - > make -C tools/testing/roadtest/ -j24 OPTS="-k adc --rt-timeout 10 -v" > > The test passes on v5.19 but on current mainline it hangs and times out. This is very nice. Thanks. I followed your instructions, and if I apply the following, all tests are passed. diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 83da8862b8f2..32c01e684af3 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1727,8 +1727,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread) spin_unlock_irqrestore(&ctlr->queue_lock, flags); ret = __spi_pump_transfer_message(ctlr, msg, was_busy); - if (!ret) - kthread_queue_work(ctlr->kworker, &ctlr->pump_messages); + kthread_queue_work(ctlr->kworker, &ctlr->pump_messages); ctlr->cur_msg = NULL; ctlr->fallback = false; The problem is that if __spi_pump_transfer_message() fails, the ctlr->busy flag is left true, so __spi_async() is not going to queue new work. The busy transition is handled right above that piece of code, in __spi_pump_transfer_message(), and the mechanism is to queue more work to do it. Apparently work was only queued when the transfer was successful, and I am not sure why it was like that. Queuing work unconditionally solves the issue and should not be a problem. Curious thing is, that this change alone is sufficient to make all the roadtest tests pass. But I still think that does not fix the bug reported by Casper. For that, Mark's patch is also necessary. @Casper: can you test Mark's patch with above addition? > > think we need to either change spi_mux to duplicate the incoming message > > (that's probably "cleaner") or teach the core that spi-mux exists and > > should always use the pump/queue. The below might do the trick but in > > spite of my suggestion above I've not tested either yet: > > > > diff --git a/drivers/spi/spi-mux.c b/drivers/spi/spi-mux.c > > index f5d32ec4634e..0709e987bd5a 100644 > > --- a/drivers/spi/spi-mux.c > > +++ b/drivers/spi/spi-mux.c > > @@ -161,6 +161,7 @@ static int spi_mux_probe(struct spi_device *spi) > > ctlr->num_chipselect = mux_control_states(priv->mux); > > ctlr->bus_num = -1; > > ctlr->dev.of_node = spi->dev.of_node; > > + ctlr->must_async = true; > > > > ret = devm_spi_register_controller(&spi->dev, ctlr); > > if (ret) > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > > index 1cfed874f7ae..88d48a105d3c 100644 > > --- a/drivers/spi/spi.c > > +++ b/drivers/spi/spi.c > > @@ -4033,7 +4033,7 @@ static int __spi_sync(struct spi_device *spi, struct spi_message *message) > > * guard against reentrancy from a different context. The io_mutex > > * will catch those cases. > > */ > > - if (READ_ONCE(ctlr->queue_empty)) { > > + if (READ_ONCE(ctlr->queue_empty) && !ctlr->must_async) { > > message->actual_length = 0; > > message->status = -EINPROGRESS; > > > > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h > > index e6c73d5ff1a8..f089ee1ead58 100644 > > --- a/include/linux/spi/spi.h > > +++ b/include/linux/spi/spi.h > > @@ -469,6 +469,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch > > * SPI_TRANS_FAIL_NO_START. > > * @queue_empty: signal green light for opportunistically skipping the queue > > * for spi_sync transfers. > > + * @must_async: disable all fast paths in the core > > * > > * Each SPI controller can communicate with one or more @spi_device > > * children. These make a small bus, sharing MOSI, MISO and SCK signals > > @@ -690,6 +691,7 @@ struct spi_controller { > > > > /* Flag for enabling opportunistic skipping of the queue in spi_sync */ > > bool queue_empty; > > + bool must_async; > > }; > > > > static inline void *spi_controller_get_devdata(struct spi_controller *ctlr) > > > > Assuming that works out there'll be an extra test in the fast path but > > no sync operations, and a performance hit for spi-mux users. Hopefully > > that works as well as it did before. >[...] Best regards, -- David Jander Protonic Holland.