Re: [patch 4/4] dm-writecache: use new API for flushing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 25 May 2018, Dan Williams wrote:

> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> > On Fri, May 25 2018 at  2:17am -0400,
> > Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> >
> >> On Thu, 24 May 2018, Dan Williams wrote:
> >>
> >> > I don't want to grow driver-local wrappers for pmem. You should use
> >> > memcpy_flushcache directly() and if an architecture does not define
> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
> >> > see a need to add a standalone flush operation if all relevant archs
> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
> >> > directly since all archs define it. Alternatively we could introduce
> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
> >> > routine and memcpy_flushcache() would imply a wmb().
> >>
> >> But memcpy_flushcache() on ARM64 is slow.
> 
> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
> that as a trailblazing requirement, I see that as typical review and a
> reduction of the operation space that you are proposing.

memcpy_flushcache on ARM64 is generally wrong thing to do, because it is 
slower than memcpy and explicit cache flush.

Suppose that you want to write data to a block device and make it 
persistent. So you send a WRITE bio and then a FLUSH bio.

Now - how to implement these two bios on persistent memory:

On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does 
wmb() - this is the optimal implementation.

But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal 
implementation is that the WRITE bio does just memcpy() and the FLUSH bio 
does arch_wb_cache_pmem() on the affected range.

Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture 
doesn't have non-temporal stores. So, memcpy_flushcache() is implemented 
as memcpy() followed by a cache flush.

Now - if you flush the cache immediatelly after memcpy, the cache is full 
of dirty lines and the cache-flushing code has to write these lines back 
and that is slow.

If you flush the cache some time after memcpy (i.e. when the FLUSH bio is 
received), the processor already flushed some part of the cache on its 
own, so the cache-flushing function has less work to do and it is faster.

So the conclusion is - don't use memcpy_flushcache on ARM. This problem 
cannot be fixed by a better implementation of memcpy_flushcache.

Mikulas

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel



[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux