Understanding Bluestore performance characteristics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We have a production cluster of 27 OSD's across 5 servers (all SSD's
running bluestore), and have started to notice a possible performance issue.

In order to isolate the problem, we built a single server with a single
OSD, and ran a few FIO tests. The results are puzzling, not that we were
expecting good performance on a single OSD.

In short, with a sequential write test, we are seeing huge numbers of reads
hitting the actual SSD

Key FIO parameters are:

[global]
pool=benchmarks
rbdname=disk-1
direct=1
numjobs=2
iodepth=1
blocksize=4k
group_reporting=1
[writer]
readwrite=write

iostat results are:
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00   105.00 4896.00  294.00 312080.00  1696.00   120.92
   17.25    3.35    3.55    0.02   0.02  12.60

There are nearly ~5000 reads/second (~300 MB/sec), compared with only ~300
writes (~1.5MB/sec), when we are doing a sequential write test? The system
is otherwise idle, with no other workload.

Running the same fio test with only 1 thread (numjobs=1) still shows a high
number of reads (110).

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00  1281.00  110.00 1463.00   440.00 12624.00    16.61
    0.03    0.02    0.05    0.02   0.02   3.40

Can anyone kindly offer any comments on why we are seeing this behaviour?

I can understand if there's the occasional read here and there if
RocksDB/WAL entries need to be read from disk during the sequential write
test, but this seems significantly high and unusual.

FIO results (numjobs=2)
writer: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=rbd, iodepth=1
...
fio-3.7
Starting 2 processes
Jobs: 1 (f=1): [W(1),_(1)][52.4%][r=0KiB/s,w=208KiB/s][r=0,w=52 IOPS][eta
01m:00s]
writer: (groupid=0, jobs=2): err= 0: pid=19553: Mon Feb  3 22:46:16 2020
  write: IOPS=34, BW=137KiB/s (140kB/s)(8228KiB/60038msec)
    slat (nsec): min=5402, max=77083, avg=27305.33, stdev=7786.83
    clat (msec): min=2, max=210, avg=58.32, stdev=70.54
     lat (msec): min=2, max=210, avg=58.35, stdev=70.54
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[   54], 60.00th=[   62],
     | 70.00th=[   65], 80.00th=[  174], 90.00th=[  188], 95.00th=[  194],
     | 99.00th=[  201], 99.50th=[  203], 99.90th=[  209], 99.95th=[  209],
     | 99.99th=[  211]
   bw (  KiB/s): min=   24, max=  144, per=49.69%, avg=68.08, stdev=38.22,
samples=239
   iops        : min=    6, max=   36, avg=16.97, stdev= 9.55, samples=239
  lat (msec)   : 4=49.83%, 10=0.10%, 100=29.90%, 250=20.18%
  cpu          : usr=0.08%, sys=0.08%, ctx=2100, majf=0, minf=118
  IO depths    : 1=105.3%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued rwts: total=0,2057,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=137KiB/s (140kB/s), 137KiB/s-137KiB/s (140kB/s-140kB/s),
io=8228KiB (8425kB), run=60038-60038msec
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux