Re: Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 08, 2022 at 10:48:41AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 06, 2022 at 04:21:50PM +0100, Will Deacon wrote:
> > The simplest fix (diff for arm64 below) seems to be changing the
> > invalidation in this case to be a "clean" in arm(64)-speak so that any
> > dirty lines are written back, therefore limiting the stale data to the
> > initial buffer contents. In doing so, this makes the FROM_DEVICE and
> > BIDIRECTIONAL cases identical which makes some intuitive sense if you
> > think of FROM_DEVICE as first doing a TO_DEVICE of any dirty CPU cache
> > lines. One interesting thing I noticed is that the csky implementation
> > additionally zeroes the buffer prior to the clean, but this seems to be
> > overkill.
> 
> Btw, one thing I'd love to (and might need some help from the arch
> maintainers) is to change how the dma cache maintainance hooks work.
> 
> Right now they are high-level and these kinds of decisions need to
> be take in the arch code.  I'd prefer to move over to the architectures
> providing very low-level helpers to:
> 
>   - writeback
>   - invalidate
>   - invalidate+writeback
> 
> Note arch/arc/mm/dma.c has a ver nice documentation of what we need to
> based on a mail from Russell, and we should keep it uptodate with any
> changes to the status quo and probably move it to common documentation
> at leat.

Note that simply devolving the operations to this set is not optimal.
If you notice, both my email and the table that was copied from my
email makes two of the invalidate options dependent on the properties
of the CPU cache architecture.

While we could invalidate anyway at that point, this just fuels the
view that generic code == non-optimal, performance degrading code.
This is why the underlying interfaces on 32-bit ARM are not these
cache operations, but are instead based on buffer ownership
transitions - __dma_page_cpu_to_dev() and __dma_page_dev_to_cpu()
are the two things that are really needed.

There's also additional code that deals with the d-cache state
(which itself is architecture specific, which avoids useless
d-cache maintenance when page cache pages get mapped into userspace)
as well as the complexities of dealing with more than one level of
cache - where the order of the inner and outer cache maintenance
can't be expressed as per the simple functions you mention above.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux