Re: [PATCH] Revert "MIPS: cache: Provide cache flush operations for XFS"

Kevin Cernekee <cernekee@xxxxxxxxx> · Mon, 9 Apr 2012 19:44:05 -0700

On Mon, Apr 9, 2012 at 9:13 AM, Steven J. Hill <sjhill@xxxxxxxx> wrote:
> This reverts commit d759e6bf49a41edd66de7c6480b34cceb88ee29a.

It may be clearer to say "refactors" instead of "reverts," as this
patch reimplements the same feature in a different way.

If the change was completely reverting the MIPS *_kernel_vmap_range
functions, that would make me very uneasy as XFS will not run reliably
on my systems without them.

Also, the commit hash is d9cdc901af0f92da7f90c750d8c187f5500be067 in
Linus' tree.

> The
> flush_kernel_vmap_range() and invalidate_kernel_vmap_range() are DMA
> cache flushing operations and should be done via _dma_cache_wback_inv()
> and _dma_cache_inv().

I think there is some ambiguity here.

Documentation/cachetlb.txt says:

"Since kernel I/O goes via physical pages, the I/O subsystem assumes
that the user mapping and kernel offset mapping are the only aliases.
This isn't true for vmap aliases, so anything in the kernel trying to
do I/O to vmap areas must manually manage coherency.  It must do this
by flushing the vmap range before doing I/O and invalidating it after
the I/O returns."

"The design is to make this area safe to perform I/O on."

This appears to support your statement.

However, actual usage suggests that the *_kernel_vmap_range functions
are used way up at the filesystem layer just to get the data out of
the "wrong colored" D$ lines on systems with cache aliases:

		if (xfs_buf_is_vmapped(bp)) {
			flush_kernel_vmap_range(bp->b_addr,
						xfs_buf_vmap_len(bp));
		}
		submit_bio(rw, bio);

Then eventually, something like dma_map_sg() is called from a lower
level driver (ala ata_sg_setup()), to perform the full L1+L2
cacheflush on each page involved in the transaction.

I would also note that on ARM these operations turn into NOPs on VIPT
non-aliasing CPUs, even if their caches are noncoherent:

static inline void flush_kernel_vmap_range(void *addr, int size)
{
	if ((cache_is_vivt() || cache_is_vipt_aliasing()))
	  __cpuc_flush_dcache_area(addr, (size_t)size);
}

I suspect that reimplementing the *_kernel_vmap_range functions using
_dma_cache_* would result in a double L2 flush on the same memory
regions on systems with cache aliases, and an unnecessary L1+L2 flush
on systems without aliases.

I don't see any way that the *_kernel_vmap_range functions could be
called in lieu of the dma_map_* functions, as they are missing
critical elements such as "returning the mapped PA" and calls to the
plat_* functions.  So, any memory getting flushed through
*_kernel_vmap_range will still need to use dma_map_* for DMA.

Is there a reason why Ralf's original approach was not workable?