On Thu, 2022-04-21 at 12:35 +0200, Nuno Sá wrote: > On Thu, 2022-04-21 at 10:05 +0100, Jonathan Cameron wrote: > > On Wed, 20 Apr 2022 14:30:36 +0200 > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > > > > > On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote: > > > > Hi All, > > > > > > > > For a long time IIO has been making assumption that > > > > ____cacheline_aligned > > > > was sufficient to ensure buffers were in their own cacheline > > > > and > > > > hence > > > > DMA safe. We generally needed this for all SPI device drivers > > > > as > > > > many > > > > SPI ABI calls can pass the buffer directly through to be used > > > > for > > > > DMA. > > > > Not that regmap doesn't currently do this but it might in > > > > future > > > > (and > > > > did > > > > in the past). I can't remember the history of this well enough > > > > to > > > > know > > > > why we did it that way. I don't remember the pain of debugging > > > > random > > > > corruption caused by getting it wrong however... > > > > > > > > However it turns out via > > > > https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@xxxxxxx/ > > > > > > > > "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below > > > > the > > > > cache line size" > > > > discussion that there are platforms where this isn't true and > > > > need > > > > 128 byte alignment despite a ____cacheline_size aligning to 64 > > > > bytes > > > > on > > > > all ARM64 platforms. > > > > > > > > > > Oh boy... Here it goes my 5 cents on this > > > > :) > > > > > > > > > (I should probably have known that as the platform I support in > > > > my > > > > day > > > > job has 128 byte cachelines in L3 - however it's DMA coherent > > > > so > > > > that's not a problem). > > > > > > > > The above series also highlights that we can do much better > > > > anyway on > > > > some platforms > > > > using the new ARCH_DMA_MINALIGN (currently it's only defined > > > > for > > > > some > > > > archs but > > > > after that patch everyone gets it). We should be safe to use > > > > that > > > > everywhere > > > > we currently force ___cachline_aligned and waste a little less > > > > space > > > > on padding > > > > which is always nice ;) > > > > > > > > > > The above series actually made me go an try to have some more > > > understanding on kmalloc. I might be completely wrong but AFAIU, > > > for > > > our typical case, this won't matter much. Typically we have > > > something > > > like: > > > > > > struct something { > > > //typically we do have more parameters > > > int val; > > > u8 buffer[8] ____cacheline_aligned; > > > }; > > > > > > I thing the idea is to replace this by the compile time > > > annotation > > > ARCH_DMA_MINALIGN which is 128 (for arm64). That means that > > > kmalloc(sizeof(struct something)) will put us already in the > > > kmalloc- > > > 256 cache (right?) and then we will just get 256 aligned objects > > > as > > > this is a power of 2. The point is that as we normally do the > > > compile > > > time annotation we should still have the padding bytes as given > > > by > > > sizeof(). > > > > > > If I understood things correctly, the above series is way more > > > beneficial for smaller kmalloc calls (without annotation) that > > > won't > > > need the 128 align constrain or for structures with zero length > > > arrays > > > (annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up > > > in > > > the > > > kmalloc-192 cache. > > > > > > Does the above make any sense? > > If we keep the padding between iio_dev and iio_priv at > > ARCH_DMA_MINALIGN > > then we should be fine because we get > > > > Sure... I was just thinking in a way of saving some space. But > ARCH_DMA_MINALIGN is probably the safest way to go if we want to make > sure things are right. For the record I was not even thinking in > KMALLOC_MINALIGN but using something like cache_line_size() to get > the > __real__ size. Even though I think KMALLOC_MINALIGN would work even > though our cache line might be 128. That will be already the case for Well, this is total nonsense from me... KMALLOC_MINALIGN will fail miserably if our cache line is bigger than it. For arm64 that would be on KMALLOC_MINALIGN==64 and cache line==128. That said, aligning iio_priv() to a cache line would be safe and we could save some space in case the line is < ARCH_DMA_MINALIGN (which typically is). Anyways, as Jonathan pointed out we typically just have one IIO object per driver so the hassle is probably not worth it. - Nuno Sá