Re: IIO: Ensuring DMA safe buffers.

Nuno Sá <noname.nuno@xxxxxxxxx> · Thu, 21 Apr 2022 12:35:50 +0200

On Thu, 2022-04-21 at 10:05 +0100, Jonathan Cameron wrote:
> On Wed, 20 Apr 2022 14:30:36 +0200
> Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> 
> > On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote:
> > > Hi All,
> > > 
> > > For a long time IIO has been making assumption that
> > > ____cacheline_aligned
> > > was sufficient to ensure buffers were in their own cacheline and
> > > hence
> > > DMA safe.  We generally needed this for all SPI device drivers as
> > > many
> > > SPI ABI calls can pass the buffer directly through to be used for
> > > DMA.
> > > Not that regmap doesn't currently do this but it might in future
> > > (and
> > > did
> > > in the past).  I can't remember the history of this well enough
> > > to
> > > know
> > > why we did it that way. I don't remember the pain of debugging
> > > random
> > > corruption caused by getting it wrong however...
> > > 
> > > However it turns out via
> > > https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@xxxxxxx/
> > >  
> > > "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below the
> > > cache line size"
> > > discussion that there are platforms where this isn't true and
> > > need
> > > 128 byte alignment despite a ____cacheline_size aligning to 64
> > > bytes
> > > on
> > > all ARM64 platforms.
> > >   
> > 
> > Oh boy... Here it goes my 5 cents on this
> 
> :)
> 
> > 
> > > (I should probably have known that as the platform I support in
> > > my
> > > day
> > > job has 128 byte cachelines in L3 - however it's DMA coherent so
> > > that's not a problem).
> > > 
> > > The above series also highlights that we can do much better
> > > anyway on
> > > some platforms
> > > using the new ARCH_DMA_MINALIGN (currently it's only defined for
> > > some
> > > archs but
> > > after that patch everyone gets it).  We should be safe to use
> > > that
> > > everywhere
> > > we currently force ___cachline_aligned and waste a little less
> > > space
> > > on padding
> > > which is always nice ;)
> > >   
> > 
> > The above series actually made me go an try to have some more
> > understanding on kmalloc. I might be completely wrong but AFAIU,
> > for
> > our typical case, this won't matter much. Typically we have
> > something
> > like:
> > 
> > struct something {
> >     //typically we do have more parameters
> >     int val;
> >     u8 buffer[8] ____cacheline_aligned;
> > };
> > 
> > I thing the idea is to replace this by the compile time annotation
> > ARCH_DMA_MINALIGN which is 128 (for arm64). That means that
> > kmalloc(sizeof(struct something)) will put us already in the
> > kmalloc-
> > 256 cache (right?) and then we will just get 256 aligned objects as
> > this is a power of 2. The point is that as we normally do the
> > compile
> > time annotation we should still have the padding bytes as given by
> > sizeof().
> > 
> > If I understood things correctly, the above series is way more
> > beneficial for smaller kmalloc calls (without annotation) that
> > won't
> > need the 128 align constrain or for structures with zero length
> > arrays
> > (annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up in
> > the
> > kmalloc-192 cache.
> > 
> > Does the above make any sense?
> If we keep the padding between iio_dev and iio_priv at
> ARCH_DMA_MINALIGN
> then we should be fine because we get
> 

Sure... I was just thinking in a way of saving some space. But
ARCH_DMA_MINALIGN is probably the safest way to go if we want to make
sure things are right. For the record I was not even thinking in
KMALLOC_MINALIGN but using something like cache_line_size() to get the
__real__ size. Even though I think KMALLOC_MINALIGN would work even
though our cache line might be 128. That will be already the case for
objects with KMALLOC_MINALIGN alignment and with buffers annotated with
ARCH_DMA_MINALIGN.

> 0  iio_dev
> 
> 128 start of iio_priv
> 
> 256 start of aligned region (after changing to ARCH_DMA_MINALIGN as
> you say above.
> 383 end of potential DMA Safe region - so need to not have anything
> else
> before this.
> 
> and I think we get in kmalloc-384 cache if it exists.  Which is fine.

Probably we end up in 512 which is also fine :)

> > 
> > That said, I think that we also need to be careful in
> > iio_device_alloc(). Ideally, to save some space, we only align to a
> > real cache line (padding between iio and private struct) and not
> > ARCH_DMA_MINALIGN. Would it be enough?
> 
> For now I suggest stick to ARCH_DMA_MINALIGN but in future we may be
> able
> to relax that.  We do however run into the same issue crypto does
> of giving missleading information to the compiler (even it if it's
> a non issue in reality) because the structures at iio_priv() will
> still have compile time alignment markings to ARCH_DMA_MINALIGN
> and if the padding of struct iio_dev doesn't bring us to that
> boundary
> then the alignment isn't what he compiler thinks it is.

I think in our case this might not be a problem as I think we will
always end up in a kmalloc cache where the iio_dev alignment will
always be bigger than ARCH_DMA_MINALIGN (multiple of) so iio_priv()
should always start at an ARCH_DMA_MINALIGN aligned address (not sure
though). If I understood your point correctly this is already what will
happen in other objects after that series because we can mark a DMA
buffer with 128 alignment and the container object might end up with
64. Apparently no issue observed so far...

> I'm not immediately sure how else to get that alignment as we don't
> want to be hand adding padding to each driver..  We could do
> something
> magic like introducing a second region with iio_priv2() but that
> is horrible.
> 
> Walking through this.  If KMALLOC_MINALIGN
> falls to a small enough number then we 'could get'
> 
> 0                iio_Dev
> 
> x + delta        start of iio_priv
> x + delta + 128  start of aligned region
> x + delta + 256  end of aligned region.
> -- any issue will occur because back end of this shares the min
> dma aligned region with something else.
> 
> So at worst we get into the 384 size kmalloc cache and should be
> fine anyway.
> 
> Note that we don't care that much about potential savings as there
> is only one of these structures per driver instance so it'll be very
> minor.
> 

Given the above point, agreed with using ARCH_DMA_MINALIGN (also safer
as I'm not 100% sure about what I have been saying :))

> > 
> > > Given we have no reports of a problem with 128 byte non DMA
> > > coherent
> > > platforms
> > > I don't propose to 'fix' this until we can make use of the new
> > > define
> > > in the above patch set.  That is going to make a mess of
> > > backporting
> > > the
> > > fix however.  I'm wishing we did what crypto has done and had a
> > > subsystem
> > > specific define for this but such is life.  We will also want to
> > > be  
> > 
> > We do have IIO_ALIGN but as we just found out is wrongly defined
> > and
> > was never enforced. Maybe now is a good time for enforcing it :)
> 
> Agreed. I'll spin a series to change that to be inline with what
> crypto
> does and use throughout the drivers.  Then we can move forwards from
> there as Catalin's series moves forwards.
> 
> What fun :)
> 

Cool, I'll help with reviewing...

- Nuno Sá