Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe

Petr Tesarik <ptesarik@xxxxxxxx> · Thu, 20 Mar 2025 17:08:46 +0100

On Thu, 20 Mar 2025 15:34:42 +0000
Mark Brown <broonie@xxxxxxxxxx> wrote:

> On Thu, Mar 20, 2025 at 03:35:36PM +0100, Petr Tesarik wrote:
> 
> > CC'ing Robin Murphy, because there seem to be some doubts about DMA API
> > efficiency.  
> 
> Or possibly just documentation, the number of memory types we have to
> deal with and disjoint interfaces makes all this stuff pretty miserable.

I have to agree here. Plus the existing documentation is confusing, as
it introduces some opaque terms: streaming, consistent, coherent ...
what next?

I volunteer to clean it up a bit. Or at least to give it a try.

> > > > The whole goal there is to try to avoid triggering another copy to do
> > > > the DMA so just reverting rather than replacing with some other
> > > > construct that achieves the same goal doesn't seem great.  I think
> > > > possibly we should just not do the copy at all any more and trust the
> > > > core to DTRT with any buffers that are passed in, I think we've got
> > > > enough stuff in the core though I can't remember if it'll copy with
> > > > things allocated on the stack well.  I'd need to page the status back
> > > > in.    
> 
> > No, I'm afraid kernel stack addresses (and many other types of
> > addresses) cannot be used for DMA:  
> 
> > https://docs.kernel.org/core-api/dma-api-howto.html#what-memory-is-dma-able  
> 
> Right, that's what I thought.  Part of what spi_write_then_read() is
> doing is taking the edge off that particular sharp edge for users, on
> the off chance that the controller wants to DMA.

Thanks for explaining the goal. It seems that most subsystems pass this
complexity down to individual device drivers, and I agree that it is
not the best approach.

If we want to make life easier for authors who don't need to squeeze
the last bit of performance from their driver, the core DMA API could be
extended with a wrapper function that checks DMA-ability of a buffer
address and takes the appropriate action. I kind of like the idea, but
I'm not a subsystem maintainer, so my opinion doesn't mean much. ;-)

> > > From what I found, there are two scenarios that may depend on
> > > GFP_DMA today:  
> 
> > >  a) a performance optimization where allocating from GFP_DMA avoids
> > >     the swiotlb bounce buffering. This would be the normal case on
> > >     any 64-bit machine with more than 4GB of RAM and an SPI
> > >     controller with a 32-bit DMA mask.  
> 
> > I must be missing something. How is a memcpy() in spi_write_then_read()
> > faster than a memcpy() by swiotlb?  
> 
> spi_write_then_read() is just a convenience API, a good proportion of
> users will be using spi_sync() directly.

Got it.

> > I still believe the SPI subsystem should not try to be clever. The
> > DMA API already avoids unnecessary copying as much as possible.  
> 
> It's not particularly trying to be clever here?

Well, it tries to guess whether the lower layer will have to make a
copy, but it does not always get it right (e.g. memory encryption).

Besides, txbuf and rxbuf might be used without any copying at all, e.g.
if they point to direct-mapped virtual addresses (e.g. returned by
kmalloc).

At the end of the day, it's no big deal, because SPI transfers are
usually small and not performance-critical. I wouldn't be bothered
myself if it wasn't part of a larger project (getting rid of DMA zones).

Petr T