Re: [RFC] A new SPI API for fast, low-latency regmap peripheral access

David Jander <david@xxxxxxxxxxx> · Wed, 25 May 2022 16:39:46 +0200

On Tue, 24 May 2022 20:46:16 +0100
Mark Brown <broonie@xxxxxxxxxx> wrote:

> On Tue, May 24, 2022 at 01:30:48PM +0200, David Jander wrote:
> 
> > > But that turned out be not working properly:
> > >
> > > | https://lore.kernel.org/all/f86eaebb-0359-13be-f4a2-4f2b8832252e@xxxxxxxxxx/  
> 
> > Ah, thanks for sharing. Added Martin to CC here.  
> 
> > I have been struggling with this too. There are definitely dragons somewhere.
> > I have tried to do tear-down in the same context if possible, similar to this:  
> 
> There's a potential issue there with ending up spending noticable extra
> time turning the controller on and off, how costly that is might be
> variable.
> 
> > I have not yet discovered exactly why. In the occasions the code didn't hit
> > the race, it seemed to have a notable performance impact though, so removing
> > this context switch may be worth the effort.  
> 
> Or come up with a mechanism for ensuring we only switch to do the
> cleanup when we're not otherwise busy.

I think the PM part might benefit from some optimizations too...

> > From what I understand of this, bus_lock_mutex is used to serialize spi_sync
> > calls for this bus, so there cannot be any concurrency from different threads
> > doing spi_sync() calls to the same bus. This means, that if spi_sync was the
> > only path in existence, bus_lock_mutex would suffice, and all the other  
> 
> The bus lock is there because some devices have an unfortunate need to
> do multiple SPI transfers with no other devices able to generate any
> traffic on the bus in between.  It seems that even more sadly some of
> the users are using it to protect against multiple calls into
> themselves so we can't just do the simple thing and turn the bus locks
> into noops if there's only one client on the bus.  However it *is* quite
> rarely used so I'm thinking that what we could do is default to not
> having it and then arrange to create it on first use, or just update
> the clients to do something during initialisation to cause it to be
> created.  That way only controllers with an affected client would pay
> the cost.
> 
> I don't *think* it's otherwise needed but would need to go through and
> verify that.
> 
> > spinlocks and mutexes could go. Big win. But the async path is what
> > complicates everything. So I have been thinking... what if we could make the
> > sync and the async case two separate paths, with two separate message queues?
> > In fact the sync path doesn't even need a queue really, since it will just
> > handle one message beginning to end, and not return until the message is done.
> > It doesn't need the worker thread either AFAICS. Or am I missing something?
> > In the end, both paths would converge at the io_mutex. I am tempted to try
> > this route. Any thoughts?  
> 
> The sync path like you say doesn't exactly need a queue itself, it's
> partly looking at the queue so it can fall back to pushing work through
> the thread in the case that the controller is busy (hopefully opening up
> opportunities for us to minimise the latency between completing whatever
> is going on already and starting the next message) and partly about
> pushing the work idling the hardware into the thread so that it's
> deferred a bit and we're less likely to end up spending time bouncing
> the controller on and off if there's a sequence of synchronous
> operations going on.  That second bit doesn't need us to actually look
> at the queue though, we just need to kick the thread so it gets run at
> some point and sees that the queue is empty.
> 
> Again I need to think this through properly but we can probably arrange
> things so that 
> 
> >         --> __spi_sync()
> >             --> bus_lock_spinlock
> >                 --> queue_lock
> >                     --> list_add_tail()
> >             --> __spi_pump_messages() (also entered here from WQ)
> >                 --> queue_lock
> >                     --> list_first_entry()  
> 
> the work we do under the first queue_lock here gets shuffled into
> __spin_pump_messages() (possibly replace in_kthread with passing in a
> message?  Would need comments.).  That'd mean we'd at least only be
> taking the queue lock once.
> 
> The other potential issue with eliminating the queue entirely would be
> if there's clients which are mixing async and sync operations but
> expecting them to complete in order (eg, start a bunch of async stuff
> then do a sync operation at the end rather than have a custom
> wait/completion).

I thought it would be easier to look at some code at this point, so I just
sent out an RFC patch series for the discussion. As for the ordering problem
you mention, I think I have solved for that. See here:

https://marc.info/?l=linux-spi&m=165348892007041&w=2

I can understand if you ultimately reject this series though, since I could
not avoid making a small change to the client driver API. I just can't figure
out how to do it without that change. The problem is that
spi_finalize_current_message() relies on ctlr->cur_msg, and we cannot touch
that if we skip the queue. Sorry for that.

Best regards,

-- 
David Jander