On Tue, 2017-01-17 at 16:59 +0100, Jan Kara wrote: > On Fri 13-01-17 17:20:08, Ross Zwisler wrote: : > > - If I recall correctly, at one point Dave Chinner suggested that > > we change - If I recall correctly, at one point Dave Chinner > > suggested that we change DAX so that I/O would use cached stores > > instead of the non-temporal stores that it currently uses. We > > would then track pages that were written to by DAX in the radix > > tree so that they would be flushed later during > > fsync/msync. Does this sound like a win? Also, assuming that we > > can find a solution for platforms where the processor cache is part > > of the ADR safe zone (above topic) this would be a clear > > improvement, moving us from using non-temporal stores to faster > > cached stores with no downside. > > I guess this needs measurements. But it is worth a try. Brain Boylston did some measurement before. http://oss.sgi.com/archives/xfs/2016-08/msg00239.html I updated his test program to skip pmem_persist() for the cached copy case. dst = dstbase; + #if 0 /* see note above */ if (mode == 'c') pmem_persist(dst, dstsz); + #endif } Here are sample runs: $ numactl -N0 time -p ./memcpyperf c /mnt/pmem0/file 1000000 INFO: dst 0x7f1d00000000 src 0x601200 dstsz 2756509696 cpysz 16384 real 3.28 user 3.27 sys 0.00 $ numactl -N0 time -p ./memcpyperf n /mnt/pmem0/file 1000000 INFO: dst 0x7f6080000000 src 0x601200 dstsz 2756509696 cpysz 16384 real 1.01 user 1.01 sys 0.00 $ numactl -N1 time -p ./memcpyperf c /mnt/pmem0/file 1000000 INFO: dst 0x7fe900000000 src 0x601200 dstsz 2756509696 cpysz 16384 real 4.06 user 4.06 sys 0.00 $ numactl -N1 time -p ./memcpyperf n /mnt/pmem0/file 1000000 INFO: dst 0x7f7640000000 src 0x601200 dstsz 2756509696 cpysz 16384 real 1.27 user 1.27 sys 0.00 In this simple test, using non-temporal copy is still faster than using cached copy. Thanks, -Toshi ��.n��������+%������w��{.n�����{����n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�