Re: [PROBLEM] spi driver internal error during boot on sparx5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 1 Sep 2022 08:57:37 +0200
Vincent Whitchurch <vincent.whitchurch@xxxxxxxx> wrote:

> On Wed, Aug 31, 2022 at 05:07:42PM +0100, Mark Brown wrote:
> > On Mon, Aug 29, 2022 at 10:56:13AM +0200, David Jander wrote:  
> > > Not sure if this is a correct fix, but I'd like to know if your situation
> > > changes this way, if you could try it.
> > > I don't have access to any hardware with a mux unfortunately, so I can't test
> > > it myself.  
> > 
> > I guess claiming to have a noop mux might work for testing, though I'd
> > be dubious that it was actually doing the mux operations properly?  I  
> 
> I'm seeing these problems with the roadtest test framework which uses
> UML and doesn't need any special hardware.  Roadtest puts its emulated
> devices under a spi-mux with gpio-mockup and there are no tests for the
> actual muxing, but other driver tests break since transfers using
> spi-mux don't work properly.
> 
> I pushed the latest version with SPI support to the tree below.  The
> test_adc084s021 tests a SPI device.  Roadtest is placed inside a linux
> tree, but it doesn't need any patches to the kernel:
> 
>  https://github.com/vwax/linux/commits/roadtest/devel
> 
> You can reproduce the problem with:
> 
>  git remote add vwax https://github.com/vwax/linux.git
>  git fetch vwax
>  git archive vwax/roadtest/devel tools/testing/roadtest | tar xf -
>  make -C tools/testing/roadtest/ -j24 OPTS="-k adc --rt-timeout 10 -v"
> 
> The test passes on v5.19 but on current mainline it hangs and times out.

This is very nice. Thanks.
I followed your instructions, and if I apply the following, all tests are
passed.

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 83da8862b8f2..32c01e684af3 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1727,8 +1727,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
        spin_unlock_irqrestore(&ctlr->queue_lock, flags);
 
        ret = __spi_pump_transfer_message(ctlr, msg, was_busy);
-       if (!ret)
-               kthread_queue_work(ctlr->kworker, &ctlr->pump_messages);
+       kthread_queue_work(ctlr->kworker, &ctlr->pump_messages);
 
        ctlr->cur_msg = NULL;
        ctlr->fallback = false;

The problem is that if __spi_pump_transfer_message() fails, the ctlr->busy
flag is left true, so __spi_async() is not going to queue new work. The busy
transition is handled right above that piece of code, in
__spi_pump_transfer_message(), and the mechanism is to queue more work to do
it. Apparently work was only queued when the transfer was successful, and I am
not sure why it was like that. Queuing work unconditionally solves the issue
and should not be a problem.

Curious thing is, that this change alone is sufficient to make all the
roadtest tests pass. But I still think that does not fix the bug reported by
Casper. For that, Mark's patch is also necessary.

@Casper: can you test Mark's patch with above addition?

> > think we need to either change spi_mux to duplicate the incoming message
> > (that's probably "cleaner") or teach the core that spi-mux exists and
> > should always use the pump/queue.  The below might do the trick but in
> > spite of my suggestion above I've not tested either yet:
> > 
> > diff --git a/drivers/spi/spi-mux.c b/drivers/spi/spi-mux.c
> > index f5d32ec4634e..0709e987bd5a 100644
> > --- a/drivers/spi/spi-mux.c
> > +++ b/drivers/spi/spi-mux.c
> > @@ -161,6 +161,7 @@ static int spi_mux_probe(struct spi_device *spi)
> >  	ctlr->num_chipselect = mux_control_states(priv->mux);
> >  	ctlr->bus_num = -1;
> >  	ctlr->dev.of_node = spi->dev.of_node;
> > +	ctlr->must_async = true;
> >  
> >  	ret = devm_spi_register_controller(&spi->dev, ctlr);
> >  	if (ret)
> > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> > index 1cfed874f7ae..88d48a105d3c 100644
> > --- a/drivers/spi/spi.c
> > +++ b/drivers/spi/spi.c
> > @@ -4033,7 +4033,7 @@ static int __spi_sync(struct spi_device *spi, struct spi_message *message)
> >  	 * guard against reentrancy from a different context. The io_mutex
> >  	 * will catch those cases.
> >  	 */
> > -	if (READ_ONCE(ctlr->queue_empty)) {
> > +	if (READ_ONCE(ctlr->queue_empty) && !ctlr->must_async) {
> >  		message->actual_length = 0;
> >  		message->status = -EINPROGRESS;
> >  
> > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
> > index e6c73d5ff1a8..f089ee1ead58 100644
> > --- a/include/linux/spi/spi.h
> > +++ b/include/linux/spi/spi.h
> > @@ -469,6 +469,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch
> >   *	SPI_TRANS_FAIL_NO_START.
> >   * @queue_empty: signal green light for opportunistically skipping the queue
> >   *	for spi_sync transfers.
> > + * @must_async: disable all fast paths in the core
> >   *
> >   * Each SPI controller can communicate with one or more @spi_device
> >   * children.  These make a small bus, sharing MOSI, MISO and SCK signals
> > @@ -690,6 +691,7 @@ struct spi_controller {
> >  
> >  	/* Flag for enabling opportunistic skipping of the queue in spi_sync */
> >  	bool			queue_empty;
> > +	bool			must_async;
> >  };
> >  
> >  static inline void *spi_controller_get_devdata(struct spi_controller *ctlr)
> > 
> > Assuming that works out there'll be an extra test in the fast path but
> > no sync operations, and a performance hit for spi-mux users.  Hopefully
> > that works as well as it did before.  
>[...]

Best regards,

-- 
David Jander
Protonic Holland.



[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux