Hi,
We have a production cluster of 27 OSD's across 5 servers (all SSD's
running bluestore), and have started to notice a possible performance
issue.
In order to isolate the problem, we built a single server with a single
OSD, and ran a few FIO tests. The results are puzzling, not that we were
expecting good performance on a single OSD.
In short, with a sequential write test, we are seeing huge numbers of
reads
hitting the actual SSD
Key FIO parameters are:
[global]
pool=benchmarks
rbdname=disk-1
direct=1
numjobs=2
iodepth=1
blocksize=4k
group_reporting=1
[writer]
readwrite=write
iostat results are:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz
avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 105.00 4896.00 294.00 312080.00 1696.00
120.92
17.25 3.35 3.55 0.02 0.02 12.60
There are nearly ~5000 reads/second (~300 MB/sec), compared with only
~300
writes (~1.5MB/sec), when we are doing a sequential write test? The
system
is otherwise idle, with no other workload.
Running the same fio test with only 1 thread (numjobs=1) still shows a
high
number of reads (110).
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz
avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 1281.00 110.00 1463.00 440.00 12624.00
16.61
0.03 0.02 0.05 0.02 0.02 3.40
Can anyone kindly offer any comments on why we are seeing this behaviour?
I can understand if there's the occasional read here and there if
RocksDB/WAL entries need to be read from disk during the sequential write
test, but this seems significantly high and unusual.
FIO results (numjobs=2)
writer: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=rbd, iodepth=1
...
fio-3.7
Starting 2 processes
Jobs: 1 (f=1): [W(1),_(1)][52.4%][r=0KiB/s,w=208KiB/s][r=0,w=52 IOPS][eta
01m:00s]
writer: (groupid=0, jobs=2): err= 0: pid=19553: Mon Feb 3 22:46:16 2020
write: IOPS=34, BW=137KiB/s (140kB/s)(8228KiB/60038msec)
slat (nsec): min=5402, max=77083, avg=27305.33, stdev=7786.83
clat (msec): min=2, max=210, avg=58.32, stdev=70.54
lat (msec): min=2, max=210, avg=58.35, stdev=70.54
clat percentiles (msec):
| 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[
3],
| 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 54], 60.00th=[
62],
| 70.00th=[ 65], 80.00th=[ 174], 90.00th=[ 188], 95.00th=[
194],
| 99.00th=[ 201], 99.50th=[ 203], 99.90th=[ 209], 99.95th=[
209],
| 99.99th=[ 211]
bw ( KiB/s): min= 24, max= 144, per=49.69%, avg=68.08,
stdev=38.22,
samples=239
iops : min= 6, max= 36, avg=16.97, stdev= 9.55,
samples=239
lat (msec) : 4=49.83%, 10=0.10%, 100=29.90%, 250=20.18%
cpu : usr=0.08%, sys=0.08%, ctx=2100, majf=0, minf=118
IO depths : 1=105.3%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
issued rwts: total=0,2057,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=137KiB/s (140kB/s), 137KiB/s-137KiB/s (140kB/s-140kB/s),
io=8228KiB (8425kB), run=60038-60038msec
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx