On Tue, 2016-08-30 at 17:01 -0600, Ross Zwisler wrote: > On Tue, Aug 23, 2016 at 04:04:10PM -0600, Ross Zwisler wrote: > > > > DAX PMDs have been disabled since Jan Kara introduced DAX radix > > tree based locking. This series allows DAX PMDs to participate in > > the DAX radix tree based locking scheme so that they can be re- > > enabled. > > > > Changes since v1: > > - PMD entry locking is now done based on the starting offset of > > the PMD entry, rather than on the radix tree slot which was > > unreliable. (Jan) > > - Fixed the one issue I could find with hole punch. As far as I > > can tell hole punch now works correctly for both PMD and PTE DAX > > entries, 4k zero pages and huge zero pages. > > - Fixed the way that ext2 returns the size of holes in > > ext2_get_block(). (Jan) > > - Made the 'wait_table' global variable static in respnse to a > > sparse warning. > > - Fixed some more inconsitent usage between the names 'ret' and > > 'entry' for radix tree entry variables. > > > > Ross Zwisler (9): > > ext4: allow DAX writeback for hole punch > > ext2: tell DAX the size of allocation holes > > ext4: tell DAX the size of allocation holes > > dax: remove buffer_size_valid() > > dax: make 'wait_table' global variable static > > dax: consistent variable naming for DAX entries > > dax: coordinate locking for offsets in PMD range > > dax: re-enable DAX PMD support > > dax: remove "depends on BROKEN" from FS_DAX_PMD > > > > fs/Kconfig | 1 - > > fs/dax.c | 297 +++++++++++++++++++++++++++++----------- > > ------------ > > fs/ext2/inode.c | 3 + > > fs/ext4/inode.c | 7 +- > > include/linux/dax.h | 29 ++++- > > mm/filemap.c | 6 +- > > 6 files changed, 201 insertions(+), 142 deletions(-) > > > > -- > > 2.9.0 > > Ping on this series? Any objections or comments? Hi Ross, I am seeing a major performance loss in fio mmap test with this patch- set applied. This happens with or without my patches [1] applied on top of yours. Without my patches, dax_pmd_fault() falls back to the pte handler since an mmap'ed address is not 2MB-aligned. I have attached three test results. o rc4.log - 4.8.0-rc4 (base) o non-pmd.log - 4.8.0-rc4 + your patchset (fall back to pte) o pmd.log - 4.8.0-rc4 + your patchset + my patchset (use pmd maps) My test steps are as follows. mkfs.ext4 -O bigalloc -C 2M /dev/pmem0 mount -o dax /dev/pmem0 /mnt/pmem0 numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio test.fio "test.fio" --- [global] bs=4k size=2G directory=/mnt/pmem0 ioengine=mmap [randrw] rw=randrw --- Can you please take a look? Thanks, -Toshi [1] https://lkml.org/lkml/2016/8/29/560
randrw: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1 fio-2.6 Starting 1 process randrw: Laying out IO file(s) (1 file(s) / 2048MB) randrw: (groupid=0, jobs=1): err= 0: pid=12656: Wed Aug 31 18:14:06 2016 read : io=1024.7MB, bw=3076.4KB/s, iops=769, runt=341062msec clat (usec): min=415, max=1703, avg=509.78, stdev=37.40 lat (usec): min=415, max=1703, avg=509.81, stdev=37.40 clat percentiles (usec): | 1.00th=[ 482], 5.00th=[ 498], 10.00th=[ 498], 20.00th=[ 498], | 30.00th=[ 502], 40.00th=[ 502], 50.00th=[ 502], 60.00th=[ 502], | 70.00th=[ 502], 80.00th=[ 506], 90.00th=[ 524], 95.00th=[ 540], | 99.00th=[ 724], 99.50th=[ 732], 99.90th=[ 748], 99.95th=[ 860], | 99.99th=[ 900] bw (KB /s): min= 2688, max= 3552, per=100.00%, avg=3078.69, stdev=143.84 write: io=1023.4MB, bw=3072.6KB/s, iops=768, runt=341062msec clat (usec): min=683, max=1955, avg=788.99, stdev=45.83 lat (usec): min=683, max=1955, avg=789.04, stdev=45.84 clat percentiles (usec): | 1.00th=[ 756], 5.00th=[ 772], 10.00th=[ 772], 20.00th=[ 772], | 30.00th=[ 772], 40.00th=[ 780], 50.00th=[ 780], 60.00th=[ 780], | 70.00th=[ 780], 80.00th=[ 788], 90.00th=[ 812], 95.00th=[ 828], | 99.00th=[ 1004], 99.50th=[ 1012], 99.90th=[ 1128], 99.95th=[ 1144], | 99.99th=[ 1208] bw (KB /s): min= 2752, max= 3552, per=100.00%, avg=3074.60, stdev=96.62 lat (usec) : 500=12.55%, 750=37.73%, 1000=48.96% lat (msec) : 2=0.76% cpu : usr=99.96%, sys=0.01%, ctx=32870, majf=0, minf=3014 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=262309/w=261979/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=1024.7MB, aggrb=3076KB/s, minb=3076KB/s, maxb=3076KB/s, mint=341062msec, maxt=341062msec WRITE: io=1023.4MB, aggrb=3072KB/s, minb=3072KB/s, maxb=3072KB/s, mint=341062msec, maxt=341062msec Disk stats (read/write): pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
randrw: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1 fio-2.6 Starting 1 process randrw: Laying out IO file(s) (1 file(s) / 2048MB) randrw: (groupid=0, jobs=1): err= 0: pid=19521: Wed Aug 31 17:50:39 2016 read : io=1024.7MB, bw=3034.5KB/s, iops=758, runt=345780msec clat (usec): min=492, max=1359, avg=517.20, stdev=55.87 lat (usec): min=492, max=1359, avg=517.23, stdev=55.87 clat percentiles (usec): | 1.00th=[ 498], 5.00th=[ 498], 10.00th=[ 498], 20.00th=[ 498], | 30.00th=[ 502], 40.00th=[ 502], 50.00th=[ 502], 60.00th=[ 502], | 70.00th=[ 502], 80.00th=[ 506], 90.00th=[ 524], 95.00th=[ 708], | 99.00th=[ 740], 99.50th=[ 756], 99.90th=[ 900], 99.95th=[ 908], | 99.99th=[ 1048] bw (KB /s): min= 2600, max= 3448, per=100.00%, avg=3036.52, stdev=141.59 write: io=1023.4MB, bw=3030.6KB/s, iops=757, runt=345780msec clat (usec): min=765, max=1788, avg=799.46, stdev=67.19 lat (usec): min=766, max=1788, avg=799.50, stdev=67.20 clat percentiles (usec): | 1.00th=[ 772], 5.00th=[ 772], 10.00th=[ 772], 20.00th=[ 772], | 30.00th=[ 772], 40.00th=[ 780], 50.00th=[ 780], 60.00th=[ 780], | 70.00th=[ 780], 80.00th=[ 788], 90.00th=[ 820], 95.00th=[ 996], | 99.00th=[ 1020], 99.50th=[ 1144], 99.90th=[ 1176], 99.95th=[ 1208], | 99.99th=[ 1320] bw (KB /s): min= 2704, max= 3328, per=100.00%, avg=3032.56, stdev=93.00 lat (usec) : 500=10.66%, 750=39.06%, 1000=48.19% lat (msec) : 2=2.08% cpu : usr=99.96%, sys=0.00%, ctx=32513, majf=0, minf=3012 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=262309/w=261979/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=1024.7MB, aggrb=3034KB/s, minb=3034KB/s, maxb=3034KB/s, mint=345780msec, maxt=345780msec WRITE: io=1023.4MB, aggrb=3030KB/s, minb=3030KB/s, maxb=3030KB/s, mint=345780msec, maxt=345780msec Disk stats (read/write): pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
randrw: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1 fio-2.6 Starting 1 process randrw: Laying out IO file(s) (1 file(s) / 2048MB) randrw: (groupid=0, jobs=1): err= 0: pid=12678: Wed Aug 31 19:59:45 2016 read : io=1024.7MB, bw=775489KB/s, iops=193872, runt= 1353msec clat (usec): min=1, max=297, avg= 1.67, stdev= 2.92 lat (usec): min=1, max=297, avg= 1.70, stdev= 2.96 clat percentiles (usec): | 1.00th=[ 1], 5.00th=[ 1], 10.00th=[ 1], 20.00th=[ 1], | 30.00th=[ 1], 40.00th=[ 1], 50.00th=[ 2], 60.00th=[ 2], | 70.00th=[ 2], 80.00th=[ 2], 90.00th=[ 2], 95.00th=[ 2], | 99.00th=[ 3], 99.50th=[ 4], 99.90th=[ 12], 99.95th=[ 12], | 99.99th=[ 189] bw (KB /s): min=736608, max=792296, per=98.58%, avg=764452.00, stdev=39377.36 write: io=1023.4MB, bw=774513KB/s, iops=193628, runt= 1353msec clat (usec): min=2, max=235, avg= 2.66, stdev= 3.59 lat (usec): min=2, max=235, avg= 2.70, stdev= 3.61 clat percentiles (usec): | 1.00th=[ 2], 5.00th=[ 2], 10.00th=[ 2], 20.00th=[ 2], | 30.00th=[ 2], 40.00th=[ 2], 50.00th=[ 3], 60.00th=[ 3], | 70.00th=[ 3], 80.00th=[ 3], 90.00th=[ 3], 95.00th=[ 3], | 99.00th=[ 4], 99.50th=[ 6], 99.90th=[ 13], 99.95th=[ 14], | 99.99th=[ 193] bw (KB /s): min=736288, max=789440, per=98.50%, avg=762864.00, stdev=37584.14 lat (usec) : 2=20.18%, 4=78.23%, 10=1.40%, 20=0.16%, 50=0.01% lat (usec) : 250=0.03%, 500=0.01% cpu : usr=46.82%, sys=53.03%, ctx=135, majf=0, minf=786279 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=262309/w=261979/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=1024.7MB, aggrb=775488KB/s, minb=775488KB/s, maxb=775488KB/s, mint=1353msec, maxt=1353msec WRITE: io=1023.4MB, aggrb=774512KB/s, minb=774512KB/s, maxb=774512KB/s, mint=1353msec, maxt=1353msec Disk stats (read/write): pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%