On 11/29/18 6:18 PM, Tom Talpey wrote: > On 11/29/2018 8:39 PM, John Hubbard wrote: >> On 11/28/18 5:59 AM, Tom Talpey wrote: >>> On 11/27/2018 9:52 PM, John Hubbard wrote: >>>> On 11/27/18 5:21 PM, Tom Talpey wrote: >>>>> On 11/21/2018 5:06 PM, John Hubbard wrote: >>>>>> On 11/21/18 8:49 AM, Tom Talpey wrote: >>>>>>> On 11/21/2018 1:09 AM, John Hubbard wrote: >>>>>>>> On 11/19/18 10:57 AM, Tom Talpey wrote: >>>> [...] >>>>> I'm super-limited here this week hardware-wise and have not been able >>>>> to try testing with the patched kernel. >>>>> >>>>> I was able to compare my earlier quick test with a Bionic 4.15 kernel >>>>> (400K IOPS) against a similar 4.20rc3 kernel, and the rate dropped to >>>>> ~_375K_ IOPS. Which I found perhaps troubling. But it was only a quick >>>>> test, and without your change. >>>>> >>>> >>>> So just to double check (again): you are running fio with these parameters, >>>> right? >>>> >>>> [reader] >>>> direct=1 >>>> ioengine=libaio >>>> blocksize=4096 >>>> size=1g >>>> numjobs=1 >>>> rw=read >>>> iodepth=64 >>> >>> Correct, I copy/pasted these directly. I also ran with size=10g because >>> the 1g provides a really small sample set. >>> >>> There was one other difference, your results indicated fio 3.3 was used. >>> My Bionic install has fio 3.1. I don't find that relevant because our >>> goal is to compare before/after, which I haven't done yet. >>> >> >> OK, the 50 MB/s was due to my particular .config. I had some expensive debug options >> set in mm, fs and locking subsystems. Turning those off, I'm back up to the rated >> speed of the Samsung NVMe device, so now we should have a clearer picture of the >> performance that real users will see. > > Oh, good! I'm especially glad because I was having a heck of a time > reconfiguring the one machine I have available for this. > >> Continuing on, then: running a before and after test, I don't see any significant >> difference in the fio results: > > Excerpting from below: > >> Baseline 4.20.0-rc3 (commit f2ce1065e767), as before: >> read: IOPS=193k, BW=753MiB/s (790MB/s)(1024MiB/1360msec) >> cpu : usr=16.26%, sys=48.05%, ctx=251258, majf=0, minf=73 > > vs > >> With patches applied: >> read: IOPS=193k, BW=753MiB/s (790MB/s)(1024MiB/1360msec) >> cpu : usr=16.26%, sys=48.05%, ctx=251258, majf=0, minf=73 > > Perfect results, not CPU limited, and full IOPS. > > Curiously identical, so I trust you've checked that you measured > both targets, but if so, I say it's good. > Argh, copy-paste error in the email. The real "before" is ever so slightly better, at 194K IOPS and 759 MB/s: $ fio ./experimental-fio.conf reader: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64 fio-3.3 Starting 1 process Jobs: 1 (f=1) reader: (groupid=0, jobs=1): err= 0: pid=1715: Thu Nov 29 17:07:09 2018 read: IOPS=194k, BW=759MiB/s (795MB/s)(1024MiB/1350msec) slat (nsec): min=1245, max=2812.7k, avg=1538.03, stdev=5519.61 clat (usec): min=148, max=755, avg=326.85, stdev=18.13 lat (usec): min=150, max=3483, avg=328.41, stdev=19.53 clat percentiles (usec): | 1.00th=[ 322], 5.00th=[ 326], 10.00th=[ 326], 20.00th=[ 326], | 30.00th=[ 326], 40.00th=[ 326], 50.00th=[ 326], 60.00th=[ 326], | 70.00th=[ 326], 80.00th=[ 326], 90.00th=[ 326], 95.00th=[ 326], | 99.00th=[ 355], 99.50th=[ 537], 99.90th=[ 553], 99.95th=[ 553], | 99.99th=[ 619] bw ( KiB/s): min=767816, max=783096, per=99.84%, avg=775456.00, stdev=10804.59, samples=2 iops : min=191954, max=195774, avg=193864.00, stdev=2701.15, samples=2 lat (usec) : 250=0.09%, 500=99.30%, 750=0.61%, 1000=0.01% cpu : usr=18.24%, sys=44.77%, ctx=251527, majf=0, minf=73 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=759MiB/s (795MB/s), 759MiB/s-759MiB/s (795MB/s-795MB/s), io=1024MiB (1074MB), run=1350-1350msec Disk stats (read/write): nvme0n1: ios=222853/0, merge=0/0, ticks=71410/0, in_queue=71935, util=100.00% thanks, -- John Hubbard NVIDIA > >> >> fio.conf: >> >> [reader] >> direct=1 >> ioengine=libaio >> blocksize=4096 >> size=1g >> numjobs=1 >> rw=read >> iodepth=64 >> >> --------------------------------------------------------- >> Baseline 4.20.0-rc3 (commit f2ce1065e767), as before: >> [deleted with prejudice. See the correction above, instead] --jhubbard >> >> --------------------------------------------------------- >> With patches applied: >> >> <redforge> fast_256GB $ fio ./experimental-fio.conf >> reader: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64 >> fio-3.3 >> Starting 1 process >> Jobs: 1 (f=1) >> reader: (groupid=0, jobs=1): err= 0: pid=1738: Thu Nov 29 17:20:07 2018 >> read: IOPS=193k, BW=753MiB/s (790MB/s)(1024MiB/1360msec) >> slat (nsec): min=1381, max=46469, avg=1649.48, stdev=594.46 >> clat (usec): min=162, max=12247, avg=330.00, stdev=185.55 >> lat (usec): min=165, max=12253, avg=331.68, stdev=185.69 >> clat percentiles (usec): >> | 1.00th=[ 322], 5.00th=[ 326], 10.00th=[ 326], 20.00th=[ 326], >> | 30.00th=[ 326], 40.00th=[ 326], 50.00th=[ 326], 60.00th=[ 326], >> | 70.00th=[ 326], 80.00th=[ 326], 90.00th=[ 326], 95.00th=[ 326], >> | 99.00th=[ 379], 99.50th=[ 594], 99.90th=[ 603], 99.95th=[ 611], >> | 99.99th=[12125] >> bw ( KiB/s): min=751640, max=782912, per=99.52%, avg=767276.00, stdev=22112.64, samples=2 >> iops : min=187910, max=195728, avg=191819.00, stdev=5528.16, samples=2 >> lat (usec) : 250=0.08%, 500=99.30%, 750=0.59% >> lat (msec) : 20=0.02% >> cpu : usr=16.26%, sys=48.05%, ctx=251258, majf=0, minf=73 >> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% >> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% >> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% >> issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 >> latency : target=0, window=0, percentile=100.00%, depth=64 >> >> Run status group 0 (all jobs): >> READ: bw=753MiB/s (790MB/s), 753MiB/s-753MiB/s (790MB/s-790MB/s), io=1024MiB (1074MB), run=1360-1360msec >> >> Disk stats (read/write): >> nvme0n1: ios=220798/0, merge=0/0, ticks=71481/0, in_queue=71966, util=100.00% >> >> >> thanks, >> >