On Wed, May 30 2018 at 9:07am -0400, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > > On Mon, 28 May 2018, Dan Williams wrote: > > > On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > > > > > I measured it (with nvme backing store) and late cache flushing has 12% > > > better performance than eager flushing with memcpy_flushcache(). > > > > I assume what you're seeing is ARM64 over-flushing the amount of dirty > > data so it becomes more efficient to do an amortized flush at the end? > > However, that effectively makes memcpy_flushcache() unusable in the > > way it can be used on x86. You claimed that ARM does not support > > non-temporal stores, but it does, see the STNP instruction. I do not > > want to see arch specific optimizations in drivers, so either > > write-through mappings is a potential answer to remove the need to > > explicitly manage flushing, or just implement STNP hacks in > > memcpy_flushcache() like you did with MOVNT on x86. > > > > > 131836 4k iops - vs - 117016. > > > > To be clear this is memcpy_flushcache() vs memcpy + flush? > > I found out what caused the difference. I used dax_flush on the version of > dm-writecache that I had on the ARM machine (with the kernel 4.14, because > it is the last version where dax on ramdisk works) - and I thought that > dax_flush flushes the cache, but it doesn't. > > When I replaced dax_flush with arch_wb_cache_pmem, the performance > difference between early flushing and late flushing disappeared. > > So I think we can remove this per-architecture switch from dm-writecache. That is really great news, can you submit an incremental patch that layers ontop of the linux-dm.git 'dm-4.18' branch? Thanks, Mike -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel