Re: Buffered IO performance hit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 16, 2017 at 4:44 PM, Ivan Babrou <ibobrik@xxxxxxxxx> wrote:
> Hello,
>
> We are seeing high IO spikes during writeback. While writes can
> obviously slow down reads, we don't see the issue with direct IO. We
> tested different schedulers, writeback throttling and a bunch of
> different machines with on 4.9 and 4.14 with both SATA and NVMe
> devices. The results are very consistent.
>
> There's a bunch of numbers, but please bear with me. TL;DR: writeback
> elevates latencies a lot more than direct writes.
>
> Consider the following fio scenario:
>
> [global]
> direct=1
> runtime=30m
> group_reporting
> time_based
>
> [randread]
> rw=randread
> size=1g
> bs=4k
> numjobs=20
>
> We run this on 4.9.59 with XFS on Micron 1100 SSD with deadline
> scheduler. It's a dedicated disk with no activity besides our tests.
> With no writes present, xfsdist from bcc-tools reports the following
> latency distribution (in milliseconds):
>
> $ sudo /usr/share/bcc/tools/xfsdist -m
> Tracing XFS operation latency... Hit Ctrl-C to end.
> ^C
>
> operation = 'read'
>      msecs               : count     distribution
>          0 -> 1          : 482298   |****************************************|
>
> All reads completing under a millisecond here. Here's the output from fio:
>
> $ sudo fio ~/slowread.fio
> randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
> ...
> fio-2.1.11
> Starting 20 processes
> ^Cbs: 20 (f=20): [r(20)] [3.8% done] [312.1MB/0KB/0KB /s] [80.2K/0/0
> iops] [eta 28m:52s]
> fio: terminating on signal 2
>
> randread: (groupid=0, jobs=20): err= 0: pid=11804: Wed Nov 15 01:22:00 2017
>   read : io=21197MB, bw=317816KB/s, iops=79453, runt= 68297msec
>     clat (usec): min=64, max=6860, avg=250.49, stdev=93.80
>      lat (usec): min=64, max=6860, avg=250.56, stdev=93.80
>     clat percentiles (usec):
>      |  1.00th=[  149],  5.00th=[  159], 10.00th=[  165], 20.00th=[  177],
>      | 30.00th=[  189], 40.00th=[  207], 50.00th=[  225], 60.00th=[  245],
>      | 70.00th=[  270], 80.00th=[  310], 90.00th=[  374], 95.00th=[  438],
>      | 99.00th=[  580], 99.50th=[  644], 99.90th=[  764], 99.95th=[  820],
>      | 99.99th=[  940]
>     bw (KB  /s): min=13096, max=16432, per=5.00%, avg=15898.70, stdev=196.77
>     lat (usec) : 100=0.01%, 250=61.93%, 500=35.54%, 750=2.40%, 1000=0.12%
>     lat (msec) : 2=0.01%, 10=0.01%
>   cpu          : usr=0.91%, sys=3.48%, ctx=5426681, majf=0, minf=142
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued    : total=r=5426464/w=0/d=0, short=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>    READ: io=21197MB, aggrb=317815KB/s, minb=317815KB/s,
> maxb=317815KB/s, mint=68297msec, maxt=68297msec
>
> Disk stats (read/write):
>   sdb: ios=5416500/2, merge=0/0, ticks=1323879/1, in_queue=1325625, util=100.00%
>
> All good here.
>
> Let's measure pure write performance:
>
> $ sudo dd if=/dev/zero of=/cache2/zeros bs=1M count=2K oflag=direct
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB) copied, 5.06064 s, 424 MB/s
>
> $ sudo /usr/share/bcc/tools/xfsdist -m
> Tracing XFS operation latency... Hit Ctrl-C to end.
> ^C
>
> operation = 'write'
>      msecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 2042     |****************************************|
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 6        |                                        |
>
> operation = 'open'
>      msecs               : count     distribution
>          0 -> 1          : 4        |****************************************|
>
>
> Baseline is between 2ms and 3ms plus a few 8ms spikes.
>
> If we do buffered writes instead of direct writes:
>
> $ sudo dd if=/dev/zero of=/cache2/zeros bs=1M count=2K
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB) copied, 4.70492 s, 456 MB/s
>
> ivan@504m5:~$ sudo /usr/share/bcc/tools/xfsdist -m
> Tracing XFS operation latency... Hit Ctrl-C to end.
> ^C
>
> operation = 'write'
>      msecs               : count     distribution
>          0 -> 1          : 2048     |****************************************|
>
> operation = 'open'
>      msecs               : count     distribution
>          0 -> 1          : 4        |****************************************|
>
> Latency is lower, which is probably expected.
>
> Let's now mix these scenarios.
>
> * Reads and direct writes
>
> $ sudo fio ~/slowread.fio
> randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
> ...
> fio-2.1.11
> Starting 20 processes
> ^Cbs: 20 (f=20): [r(20)] [1.2% done] [311.9MB/0KB/0KB /s] [79.9K/0/0
> iops] [eta 29m:40s]
> fio: terminating on signal 2
>
> randread: (groupid=0, jobs=20): err= 0: pid=11965: Wed Nov 15 01:25:46 2017
>   read : io=5026.1MB, bw=253054KB/s, iops=63263, runt= 20342msec
>     clat (usec): min=131, max=14120, avg=314.79, stdev=441.68
>      lat (usec): min=131, max=14121, avg=314.87, stdev=441.75
>     clat percentiles (usec):
>      |  1.00th=[  149],  5.00th=[  157], 10.00th=[  165], 20.00th=[  177],
>      | 30.00th=[  191], 40.00th=[  207], 50.00th=[  227], 60.00th=[  249],
>      | 70.00th=[  278], 80.00th=[  322], 90.00th=[  402], 95.00th=[  502],
>      | 99.00th=[ 2672], 99.50th=[ 3664], 99.90th=[ 4384], 99.95th=[ 4448],
>      | 99.99th=[ 4768]
>     bw (KB  /s): min= 1197, max=16416, per=4.98%, avg=12608.77, stdev=5953.29
>     lat (usec) : 250=60.53%, 500=34.40%, 750=2.32%, 1000=0.19%
>     lat (msec) : 2=0.26%, 4=1.82%, 10=0.47%, 20=0.01%
>   cpu          : usr=0.65%, sys=2.96%, ctx=1286981, majf=0, minf=139
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued    : total=r=1286905/w=0/d=0, short=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>    READ: io=5026.1MB, aggrb=253053KB/s, minb=253053KB/s,
> maxb=253053KB/s, mint=20342msec, maxt=20342msec
>
> Disk stats (read/write):
>   sdb: ios=1271838/4099, merge=0/1, ticks=392418/6967,
> in_queue=399788, util=99.60%
>
> $ sudo dd if=/dev/zero of=/cache2/zeros bs=1M count=2K oflag=direct
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB) copied, 4.69243 s, 458 MB/s
>
> $ sudo /usr/share/bcc/tools/xfsdist -m
> Tracing XFS operation latency... Hit Ctrl-C to end.
> ^C
>
> operation = 'read'
>      msecs               : count     distribution
>          0 -> 1          : 792289   |****************************************|
>          2 -> 3          : 22484    |*                                       |
>          4 -> 7          : 5817     |                                        |
>          8 -> 15         : 112      |                                        |
>
> operation = 'write'
>      msecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 2042     |****************************************|
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 6        |                                        |
>
> operation = 'open'
>      msecs               : count     distribution
>          0 -> 1          : 1        |****************************************|
>
> Reads are definitely slower now. This is expected, because writes
> occupy some bandwidth.
>
> * Reads and buffered writes
>
> $ sudo fio ~/slowread.fio
> randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
> ...
> fio-2.1.11
> Starting 20 processes
> ^Cbs: 20 (f=20): [r(20)] [1.2% done] [309.6MB/0KB/0KB /s] [79.3K/0/0
> iops] [eta 29m:38s]
> fio: terminating on signal 2
>
> randread: (groupid=0, jobs=20): err= 0: pid=12011: Wed Nov 15 01:27:00 2017
>   read : io=5344.6MB, bw=255500KB/s, iops=63874, runt= 21420msec
>     clat (usec): min=88, max=129414, avg=311.99, stdev=2022.27
>      lat (usec): min=88, max=129415, avg=312.06, stdev=2022.29
>     clat percentiles (usec):
>      |  1.00th=[  149],  5.00th=[  157], 10.00th=[  165], 20.00th=[  177],
>      | 30.00th=[  191], 40.00th=[  207], 50.00th=[  225], 60.00th=[  247],
>      | 70.00th=[  274], 80.00th=[  310], 90.00th=[  378], 95.00th=[  442],
>      | 99.00th=[  596], 99.50th=[  660], 99.90th=[ 1012], 99.95th=[64256],
>      | 99.99th=[75264]
>     bw (KB  /s): min=   40, max=16448, per=5.06%, avg=12924.17, stdev=5972.12
>     lat (usec) : 100=0.01%, 250=61.56%, 500=35.78%, 750=2.44%, 1000=0.12%
>     lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
>     lat (msec) : 100=0.08%, 250=0.01%
>   cpu          : usr=0.46%, sys=3.19%, ctx=1368274, majf=0, minf=134
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued    : total=r=1368200/w=0/d=0, short=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>    READ: io=5344.6MB, aggrb=255499KB/s, minb=255499KB/s,
> maxb=255499KB/s, mint=21420msec, maxt=21420msec
>
> Disk stats (read/write):
>   sdb: ios=1366013/2048, merge=0/0, ticks=417412/573945,
> in_queue=993029, util=99.87%
>
> $ sudo dd if=/dev/zero of=/cache2/zeros bs=1M count=2K
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB) copied, 4.83031 s, 445 MB/s
>
> $ sudo /usr/share/bcc/tools/xfsdist -m
> Tracing XFS operation latency... Hit Ctrl-C to end.
> ^C
>
> operation = 'read'
>      msecs               : count     distribution
>          0 -> 1          : 892854   |****************************************|
>          2 -> 3          : 12       |                                        |
>          4 -> 7          : 8        |                                        |
>          8 -> 15         : 1        |                                        |
>         16 -> 31         : 37       |                                        |
>         32 -> 63         : 483      |                                        |
>         64 -> 127        : 787      |                                        |
>        128 -> 255        : 3        |                                        |
>
> operation = 'write'
>      msecs               : count     distribution
>          0 -> 1          : 2048     |****************************************|
>
> operation = 'open'
>      msecs               : count     distribution
>          0 -> 1          : 1        |****************************************|
>
> This is a lot worse, even though we wrote the same amount of data into
> the same disk. Some reads went up into 100ms+ range, which is not
> great for an SSD. Read IOPS in fio drop from 70000 into 700 during
> writeback. This doesn't seem right.
>
> Deadline scheduler is the best case here, with cfq we saw 4096ms+
> latencies for reads.
>
> Is this expected? How can we improve situation to avoid hitting high latencies?
>
> Thanks.


I re-ran the test with Intel S3500 SSD, XFS on LUKS with deadline
scheduler and remounting FS with "sync" option improved performance.

* Buffered writes: ~80-110 reads/s during writeback flush

$ fio ~/slowread.fio
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.16
Starting 20 processes
^Cbs: 20 (f=20): [r(20)] [2.0% done] [288.3MB/0KB/0KB /s] [73.8K/0/0
iops] [eta 29m:25s]
fio: terminating on signal 2

randread: (groupid=0, jobs=20): err= 0: pid=6376: Sun Nov 19 19:19:44 2017
  read : io=6152.5MB, bw=177484KB/s, iops=44370, runt= 35497msec
    clat (usec): min=113, max=552524, avg=449.86, stdev=6366.77
     lat (usec): min=113, max=552524, avg=449.95, stdev=6366.78
    clat percentiles (usec):
     |  1.00th=[  161],  5.00th=[  183], 10.00th=[  197], 20.00th=[  215],
     | 30.00th=[  227], 40.00th=[  241], 50.00th=[  255], 60.00th=[  270],
     | 70.00th=[  290], 80.00th=[  318], 90.00th=[  362], 95.00th=[  406],
     | 99.00th=[  516], 99.50th=[  564], 99.90th=[ 2736], 99.95th=[214016],
     | 99.99th=[325632]
    lat (usec) : 250=46.46%, 500=52.36%, 750=1.02%, 1000=0.03%
    lat (msec) : 2=0.03%, 4=0.02%, 10=0.01%, 20=0.01%, 100=0.01%
    lat (msec) : 250=0.08%, 500=0.01%, 750=0.01%
  cpu          : usr=0.39%, sys=1.63%, ctx=1575395, majf=0, minf=214
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=1575037/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=6152.5MB, aggrb=177483KB/s, minb=177483KB/s,
maxb=177483KB/s, mint=35497msec, maxt=35497msec

Disk stats (read/write):
    dm-12: ios=1567287/2057, merge=0/0, ticks=696784/14558300,
in_queue=15256820, util=99.82%, aggrios=1575037/2085, aggrmerge=0/1,
aggrticks=688660/1972564, aggrin_queue=2662772, aggrutil=99.81%
  sdm: ios=1575037/2085, merge=0/1, ticks=688660/1972564,
in_queue=2662772, util=99.81%


$ sudo /usr/share/bcc/tools/xfsdist -m
Tracing XFS operation latency... Hit Ctrl-C to end.
^C

operation = 'read'
     msecs               : count     distribution
         0 -> 1          : 657835   |****************************************|
         2 -> 3          : 105      |                                        |
         4 -> 7          : 16       |                                        |
         8 -> 15         : 4        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 638      |                                        |
       128 -> 255        : 593      |                                        |
       256 -> 511        : 219      |                                        |
       512 -> 1023       : 1        |                                        |

operation = 'write'
     msecs               : count     distribution
         0 -> 1          : 2057     |****************************************|

operation = 'open'
     msecs               : count     distribution
         0 -> 1          : 1        |****************************************|

$ sudo dd if=/dev/zero of=zeros bs=1M count=2K
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.0991 s, 2.0 GB/s

* Remount disk with "sync" option

$ sudo mount -o remount,sync /state

* Direct writes: ~13000 reads/s during write

$ fio ~/slowread.fio
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.16
Starting 20 processes
^Cbs: 20 (f=20): [r(20)] [2.0% done] [283.3MB/0KB/0KB /s] [72.5K/0/0
iops] [eta 29m:25s]
fio: terminating on signal 2

randread: (groupid=0, jobs=20): err= 0: pid=6956: Sun Nov 19 19:21:00 2017
 read : io=5431.5MB, bw=157443KB/s, iops=39360, runt= 35326msec
   clat (usec): min=113, max=9525, avg=507.17, stdev=1037.08
    lat (usec): min=113, max=9525, avg=507.28, stdev=1037.09
   clat percentiles (usec):
    |  1.00th=[  159],  5.00th=[  183], 10.00th=[  197], 20.00th=[  215],
    | 30.00th=[  231], 40.00th=[  247], 50.00th=[  262], 60.00th=[  278],
    | 70.00th=[  306], 80.00th=[  346], 90.00th=[  462], 95.00th=[ 1768],
    | 99.00th=[ 6432], 99.50th=[ 7264], 99.90th=[ 8256], 99.95th=[ 8384],
    | 99.99th=[ 8640]
   lat (usec) : 250=42.71%, 500=48.22%, 750=2.00%, 1000=0.48%
   lat (msec) : 2=2.05%, 4=1.79%, 10=2.75%
 cpu          : usr=0.39%, sys=1.77%, ctx=1390713, majf=0, minf=193
 IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=1390460/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  READ: io=5431.5MB, aggrb=157443KB/s, minb=157443KB/s,
maxb=157443KB/s, mint=35326msec, maxt=35326msec

Disk stats (read/write):
   dm-12: ios=1377040/4218, merge=0/0, ticks=692092/18748,
in_queue=713500, util=99.85%, aggrios=1390460/8353, aggrmerge=0/1,
aggrticks=686052/20124, aggrin_queue=706764, aggrutil=99.79%
 sdm: ios=1390460/8353, merge=0/1, ticks=686052/20124,
in_queue=706764, util=99.79%

$ sudo /usr/share/bcc/tools/xfsdist -m
Tracing XFS operation latency... Hit Ctrl-C to end.
^C

operation = 'read'
    msecs               : count     distribution
        0 -> 1          : 749225   |****************************************|
        2 -> 3          : 24720    |*                                       |
        4 -> 7          : 35377    |*                                       |
        8 -> 15         : 2764     |                                        |

operation = 'open'
    msecs               : count     distribution
        0 -> 1          : 263      |****************************************|

operation = 'fsync'
    msecs               : count     distribution
        0 -> 1          : 3        |                                        |
        2 -> 3          : 1        |                                        |
        4 -> 7          : 39       |                                        |
        8 -> 15         : 2014     |****************************************|

$ sudo dd if=/dev/zero of=zeros bs=1M count=2K
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 20.2343 s, 106 MB/s

Adding fsync actually improved read throughput during writes 162x and
cut max read latency from 552ms to 9.5ms.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux