Re: IIO: Ensuring DMA safe buffers.

Nuno Sá <noname.nuno@xxxxxxxxx> · Fri, 22 Apr 2022 09:13:54 +0200

On Thu, 2022-04-21 at 12:35 +0200, Nuno Sá wrote:
> On Thu, 2022-04-21 at 10:05 +0100, Jonathan Cameron wrote:
> > On Wed, 20 Apr 2022 14:30:36 +0200
> > Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> > 
> > > On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote:
> > > > Hi All,
> > > > 
> > > > For a long time IIO has been making assumption that
> > > > ____cacheline_aligned
> > > > was sufficient to ensure buffers were in their own cacheline
> > > > and
> > > > hence
> > > > DMA safe.  We generally needed this for all SPI device drivers
> > > > as
> > > > many
> > > > SPI ABI calls can pass the buffer directly through to be used
> > > > for
> > > > DMA.
> > > > Not that regmap doesn't currently do this but it might in
> > > > future
> > > > (and
> > > > did
> > > > in the past).  I can't remember the history of this well enough
> > > > to
> > > > know
> > > > why we did it that way. I don't remember the pain of debugging
> > > > random
> > > > corruption caused by getting it wrong however...
> > > > 
> > > > However it turns out via
> > > > https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@xxxxxxx/
> > > >  
> > > > "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below
> > > > the
> > > > cache line size"
> > > > discussion that there are platforms where this isn't true and
> > > > need
> > > > 128 byte alignment despite a ____cacheline_size aligning to 64
> > > > bytes
> > > > on
> > > > all ARM64 platforms.
> > > >   
> > > 
> > > Oh boy... Here it goes my 5 cents on this
> > 
> > :)
> > 
> > > 
> > > > (I should probably have known that as the platform I support in
> > > > my
> > > > day
> > > > job has 128 byte cachelines in L3 - however it's DMA coherent
> > > > so
> > > > that's not a problem).
> > > > 
> > > > The above series also highlights that we can do much better
> > > > anyway on
> > > > some platforms
> > > > using the new ARCH_DMA_MINALIGN (currently it's only defined
> > > > for
> > > > some
> > > > archs but
> > > > after that patch everyone gets it).  We should be safe to use
> > > > that
> > > > everywhere
> > > > we currently force ___cachline_aligned and waste a little less
> > > > space
> > > > on padding
> > > > which is always nice ;)
> > > >   
> > > 
> > > The above series actually made me go an try to have some more
> > > understanding on kmalloc. I might be completely wrong but AFAIU,
> > > for
> > > our typical case, this won't matter much. Typically we have
> > > something
> > > like:
> > > 
> > > struct something {
> > >     //typically we do have more parameters
> > >     int val;
> > >     u8 buffer[8] ____cacheline_aligned;
> > > };
> > > 
> > > I thing the idea is to replace this by the compile time
> > > annotation
> > > ARCH_DMA_MINALIGN which is 128 (for arm64). That means that
> > > kmalloc(sizeof(struct something)) will put us already in the
> > > kmalloc-
> > > 256 cache (right?) and then we will just get 256 aligned objects
> > > as
> > > this is a power of 2. The point is that as we normally do the
> > > compile
> > > time annotation we should still have the padding bytes as given
> > > by
> > > sizeof().
> > > 
> > > If I understood things correctly, the above series is way more
> > > beneficial for smaller kmalloc calls (without annotation) that
> > > won't
> > > need the 128 align constrain or for structures with zero length
> > > arrays
> > > (annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up
> > > in
> > > the
> > > kmalloc-192 cache.
> > > 
> > > Does the above make any sense?
> > If we keep the padding between iio_dev and iio_priv at
> > ARCH_DMA_MINALIGN
> > then we should be fine because we get
> > 
> 
> Sure... I was just thinking in a way of saving some space. But
> ARCH_DMA_MINALIGN is probably the safest way to go if we want to make
> sure things are right. For the record I was not even thinking in
> KMALLOC_MINALIGN but using something like cache_line_size() to get
> the
> __real__ size. Even though I think KMALLOC_MINALIGN would work even
> though our cache line might be 128. That will be already the case for

Well, this is total nonsense from me... KMALLOC_MINALIGN will fail
miserably if our cache line is bigger than it. For arm64 that would be
on KMALLOC_MINALIGN==64 and cache line==128. That said, aligning
iio_priv() to a cache line would be safe and we could save some space
in case the line is < ARCH_DMA_MINALIGN (which typically is).

Anyways, as Jonathan pointed out we typically just have one IIO object
per driver so the hassle is probably not worth it.

- Nuno Sá