On Thu, 2022-04-21 at 10:05 +0100, Jonathan Cameron wrote: > On Wed, 20 Apr 2022 14:30:36 +0200 > Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > > > On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote: > > > Hi All, > > > > > > For a long time IIO has been making assumption that > > > ____cacheline_aligned > > > was sufficient to ensure buffers were in their own cacheline and > > > hence > > > DMA safe. We generally needed this for all SPI device drivers as > > > many > > > SPI ABI calls can pass the buffer directly through to be used for > > > DMA. > > > Not that regmap doesn't currently do this but it might in future > > > (and > > > did > > > in the past). I can't remember the history of this well enough > > > to > > > know > > > why we did it that way. I don't remember the pain of debugging > > > random > > > corruption caused by getting it wrong however... > > > > > > However it turns out via > > > https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@xxxxxxx/ > > > > > > "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below the > > > cache line size" > > > discussion that there are platforms where this isn't true and > > > need > > > 128 byte alignment despite a ____cacheline_size aligning to 64 > > > bytes > > > on > > > all ARM64 platforms. > > > > > > > Oh boy... Here it goes my 5 cents on this > > :) > > > > > > (I should probably have known that as the platform I support in > > > my > > > day > > > job has 128 byte cachelines in L3 - however it's DMA coherent so > > > that's not a problem). > > > > > > The above series also highlights that we can do much better > > > anyway on > > > some platforms > > > using the new ARCH_DMA_MINALIGN (currently it's only defined for > > > some > > > archs but > > > after that patch everyone gets it). We should be safe to use > > > that > > > everywhere > > > we currently force ___cachline_aligned and waste a little less > > > space > > > on padding > > > which is always nice ;) > > > > > > > The above series actually made me go an try to have some more > > understanding on kmalloc. I might be completely wrong but AFAIU, > > for > > our typical case, this won't matter much. Typically we have > > something > > like: > > > > struct something { > > //typically we do have more parameters > > int val; > > u8 buffer[8] ____cacheline_aligned; > > }; > > > > I thing the idea is to replace this by the compile time annotation > > ARCH_DMA_MINALIGN which is 128 (for arm64). That means that > > kmalloc(sizeof(struct something)) will put us already in the > > kmalloc- > > 256 cache (right?) and then we will just get 256 aligned objects as > > this is a power of 2. The point is that as we normally do the > > compile > > time annotation we should still have the padding bytes as given by > > sizeof(). > > > > If I understood things correctly, the above series is way more > > beneficial for smaller kmalloc calls (without annotation) that > > won't > > need the 128 align constrain or for structures with zero length > > arrays > > (annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up in > > the > > kmalloc-192 cache. > > > > Does the above make any sense? > If we keep the padding between iio_dev and iio_priv at > ARCH_DMA_MINALIGN > then we should be fine because we get > Sure... I was just thinking in a way of saving some space. But ARCH_DMA_MINALIGN is probably the safest way to go if we want to make sure things are right. For the record I was not even thinking in KMALLOC_MINALIGN but using something like cache_line_size() to get the __real__ size. Even though I think KMALLOC_MINALIGN would work even though our cache line might be 128. That will be already the case for objects with KMALLOC_MINALIGN alignment and with buffers annotated with ARCH_DMA_MINALIGN. > 0 iio_dev > > 128 start of iio_priv > > 256 start of aligned region (after changing to ARCH_DMA_MINALIGN as > you say above. > 383 end of potential DMA Safe region - so need to not have anything > else > before this. > > and I think we get in kmalloc-384 cache if it exists. Which is fine. Probably we end up in 512 which is also fine :) > > > > That said, I think that we also need to be careful in > > iio_device_alloc(). Ideally, to save some space, we only align to a > > real cache line (padding between iio and private struct) and not > > ARCH_DMA_MINALIGN. Would it be enough? > > For now I suggest stick to ARCH_DMA_MINALIGN but in future we may be > able > to relax that. We do however run into the same issue crypto does > of giving missleading information to the compiler (even it if it's > a non issue in reality) because the structures at iio_priv() will > still have compile time alignment markings to ARCH_DMA_MINALIGN > and if the padding of struct iio_dev doesn't bring us to that > boundary > then the alignment isn't what he compiler thinks it is. I think in our case this might not be a problem as I think we will always end up in a kmalloc cache where the iio_dev alignment will always be bigger than ARCH_DMA_MINALIGN (multiple of) so iio_priv() should always start at an ARCH_DMA_MINALIGN aligned address (not sure though). If I understood your point correctly this is already what will happen in other objects after that series because we can mark a DMA buffer with 128 alignment and the container object might end up with 64. Apparently no issue observed so far... > I'm not immediately sure how else to get that alignment as we don't > want to be hand adding padding to each driver.. We could do > something > magic like introducing a second region with iio_priv2() but that > is horrible. > > Walking through this. If KMALLOC_MINALIGN > falls to a small enough number then we 'could get' > > 0 iio_Dev > > x + delta start of iio_priv > x + delta + 128 start of aligned region > x + delta + 256 end of aligned region. > -- any issue will occur because back end of this shares the min > dma aligned region with something else. > > So at worst we get into the 384 size kmalloc cache and should be > fine anyway. > > Note that we don't care that much about potential savings as there > is only one of these structures per driver instance so it'll be very > minor. > Given the above point, agreed with using ARCH_DMA_MINALIGN (also safer as I'm not 100% sure about what I have been saying :)) > > > > > Given we have no reports of a problem with 128 byte non DMA > > > coherent > > > platforms > > > I don't propose to 'fix' this until we can make use of the new > > > define > > > in the above patch set. That is going to make a mess of > > > backporting > > > the > > > fix however. I'm wishing we did what crypto has done and had a > > > subsystem > > > specific define for this but such is life. We will also want to > > > be > > > > We do have IIO_ALIGN but as we just found out is wrongly defined > > and > > was never enforced. Maybe now is a good time for enforcing it :) > > Agreed. I'll spin a series to change that to be inline with what > crypto > does and use throughout the drivers. Then we can move forwards from > there as Catalin's series moves forwards. > > What fun :) > Cool, I'll help with reviewing... - Nuno Sá