On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote: > Hi All, > > For a long time IIO has been making assumption that > ____cacheline_aligned > was sufficient to ensure buffers were in their own cacheline and > hence > DMA safe. We generally needed this for all SPI device drivers as > many > SPI ABI calls can pass the buffer directly through to be used for > DMA. > Not that regmap doesn't currently do this but it might in future (and > did > in the past). I can't remember the history of this well enough to > know > why we did it that way. I don't remember the pain of debugging random > corruption caused by getting it wrong however... > > However it turns out via > https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@xxxxxxx/ > > "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below the > cache line size" > discussion that there are platforms where this isn't true and need > 128 byte alignment despite a ____cacheline_size aligning to 64 bytes > on > all ARM64 platforms. > Oh boy... Here it goes my 5 cents on this > (I should probably have known that as the platform I support in my > day > job has 128 byte cachelines in L3 - however it's DMA coherent so > that's not a problem). > > The above series also highlights that we can do much better anyway on > some platforms > using the new ARCH_DMA_MINALIGN (currently it's only defined for some > archs but > after that patch everyone gets it). We should be safe to use that > everywhere > we currently force ___cachline_aligned and waste a little less space > on padding > which is always nice ;) > The above series actually made me go an try to have some more understanding on kmalloc. I might be completely wrong but AFAIU, for our typical case, this won't matter much. Typically we have something like: struct something { //typically we do have more parameters int val; u8 buffer[8] ____cacheline_aligned; }; I thing the idea is to replace this by the compile time annotation ARCH_DMA_MINALIGN which is 128 (for arm64). That means that kmalloc(sizeof(struct something)) will put us already in the kmalloc- 256 cache (right?) and then we will just get 256 aligned objects as this is a power of 2. The point is that as we normally do the compile time annotation we should still have the padding bytes as given by sizeof(). If I understood things correctly, the above series is way more beneficial for smaller kmalloc calls (without annotation) that won't need the 128 align constrain or for structures with zero length arrays (annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up in the kmalloc-192 cache. Does the above make any sense? That said, I think that we also need to be careful in iio_device_alloc(). Ideally, to save some space, we only align to a real cache line (padding between iio and private struct) and not ARCH_DMA_MINALIGN. Would it be enough? > Given we have no reports of a problem with 128 byte non DMA coherent > platforms > I don't propose to 'fix' this until we can make use of the new define > in the above patch set. That is going to make a mess of backporting > the > fix however. I'm wishing we did what crypto has done and had a > subsystem > specific define for this but such is life. We will also want to be We do have IIO_ALIGN but as we just found out is wrongly defined and was never enforced. Maybe now is a good time for enforcing it :) > - Nuno Sá