Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed

kernel@xxxxxxxxxxxxxxxx · Mon, 13 May 2019 09:21:40 +0200

Hi Mark!

> On 12.05.2019, at 10:54, Mark Brown <broonie@xxxxxxxxxx> wrote:
> 
> On Thu, May 09, 2019 at 09:47:08PM +0200, Martin Sperl wrote:
> 
>> While thinking about this again maybe an idea:
>> What about implement a second spi_transfer_one implementation (together
>> with a message pump implementation) that would handle things correctly.
> 
>> Any driver then can select the old (default) or new implementation and thus
>> would allow the optimizations to take place only for verified working drivers...
> 
> I'd rather avoid having yet another interface for drivers to use, people
> already get confused trying to choose between the ones we already have.
> It'd have to be something where the existing drivers got actively
> converted and the old interface retired rather than something that hangs
> around.

I totally understand that.

> 
>> What I would then also like to do for the new implementation is modify the
>> API a bit - ideally I would like to:
>> * Make spi_sync the primary interface which the message pump is also 
>>  using directly
>> * move all the prepare stuff early into spi-sync, so that for example the
>>  Preparing (including dma mapping) would get done in the calling thread
>>  And only the prepared message would get submitted to the queue
>>  - special processing would be needed for the spi-async case.
> 
> IIRC the mapping is deliberately done late in order to minimize the
> amount of time we're consuming resources for the mapping, there were
> some systems that had limited DMA channels.  However I don't know how
> big a concern that is in this day and age with even relatively old
> systems.

We may be able to make the mapping early or late.

The place where it REALLY makes a difference is when we are running
in the Pump (because of async or because of multiple threads writing
to the same spi bus via spi_sync)

>  The idea of spi_async() having a separate path also makes me a
> bit nervous as it's much less widely used so more likely to get broken
> accidentially.

I would try to come up with something,

> Otherwise pushing things out to the caller makes sense, it should have
> no real impact in the majority of cases where the thread is just getting
> used to idle the controller and the actual work is all happening in the
> calling context anyway and if the pump is being used it means it's
> spending more time actually pushing transfers out.

> For the case where we do have the message pump going one thing it'd be
> good to do is overlap more of the admin work around the messages with
> other transfers - ideally we'd be able to kick off the next transfer
> from within the completion of a DMA.  I need to have a dig around and
> figure out if I have any hardware that can actually support that, last
> time I looked at this my main system needed to kick everything up to the
> thread due to hardware requirements.

But to get all this done I fear it will definitely require api changes
and thus a new kind of pump.

Maybe the pump can get shared by multiple spi (master) controller. This
would help when there are say 4 devices connected each to a separate
controller and then transferring short messages that would get handled
by polling - that would mean 4 CPUs just polling, which is also consuming
lots of cpu cycles. If we could pool this polling

I am starting to wonder if there is a means to make the wakeup of threads 
fast/priority to keep the latency on spi_sync minimal - essentially
yielding the CPU to the “right” thread (so making a yield cheap).

But let us see how far we get before we can tackle this...

Form a performance/thru-put perspective I guess it may be relevant to
extend the spi_test framework to also gather performance/latency
statistics so that we have a means to compare actual performance
numbers and avoid regressions.

Martin