On Wed, Jun 08, 2022 at 10:48:41AM +0200, Christoph Hellwig wrote: > On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote: > > The simplest fix (diff for arm64 below) seems to be changing the > > invalidation in this case to be a "clean" in arm(64)-speak so that any > > dirty lines are written back, therefore limiting the stale data to the > > initial buffer contents. In doing so, this makes the FROM_DEVICE and > > BIDIRECTIONAL cases identical which makes some intuitive sense if you > > think of FROM_DEVICE as first doing a TO_DEVICE of any dirty CPU cache > > lines. One interesting thing I noticed is that the csky implementation > > additionally zeroes the buffer prior to the clean, but this seems to be > > overkill. > > Btw, one thing I'd love to (and might need some help from the arch > maintainers) is to change how the dma cache maintainance hooks work. > > Right now they are high-level and these kinds of decisions need to > be take in the arch code. I'd prefer to move over to the architectures > providing very low-level helpers to: > > - writeback > - invalidate > - invalidate+writeback > > Note arch/arc/mm/dma.c has a ver nice documentation of what we need to > based on a mail from Russell, and we should keep it uptodate with any > changes to the status quo and probably move it to common documentation > at leat That makes sense to me (assuming an opt-out for archs that want it), but I'd like to make sure that these low-level helpers aren't generally available for e.g. driver modules to dip into directly; it's pretty common for folks to request that we EXPORT our cache maintenance routines because they're violating the DMA API someplace and so far we've been pretty good at asking them to fix their code instead. > > Finally, on arm(64), the DMA mapping code tries to deal with buffers > > that are not cacheline aligned by issuing clean-and-invalidate > > operations for the overlapping portions at each end of the buffer. I > > don't think this makes a tonne of sense, as inevitably one of the > > writers (either the CPU or the DMA) is going to win and somebody is > > going to run into silent data loss. Having the caller receive > > DMA_MAPPING_ERROR in this case would probably be better. > > Yes, the mapping are supposed to be cache line aligned or at least > have padding around them. But due to the later case we can't really > easily verify this in dma-debug. Damn, I hadn't considered padding. That probably means that the csky implementation that I mentioned (which zeroes the buffer) is buggy... Will