Large latency for single thread

"norman.kern" <norman.kern@xxxxxxx> · Wed, 15 Dec 2021 22:20:48 +0800

I create a rbd pool using only two SATA SSDs(one for data, another for 
database,WAL), and set the replica size 1.

After that, I setup a fio test on Host same with the OSD placed. I found 
the latency is hundreds micro-seconds(sixty micro-seconds for the raw 
SATA SSD ).

The fio outpus:

m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15 
14:05:32 2021
  write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone 
resets
    slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
    clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
     lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
    clat percentiles (usec):
     |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824], 20.00th=[  906],
     | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[ 1303],
     | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[ 1663],
     | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[ 3949],
     | 99.99th=[ 6718]
   bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54, 
stdev=588.79, samples=360
   iops        : min=  482, max= 1262, avg=794.76, stdev=147.20, 
samples=360
  lat (usec)   : 750=2.98%, 1000=22.41%
  lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
  cpu          : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
     issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Parts of the OSD' perf status:

     "state_io_done_lat": {
            "avgcount": 151295,
            "sum": 0.336297058,
            "avgtime": 0.000002222
        },
        "state_kv_queued_lat": {
            "avgcount": 151295,
            "sum": 18.812333051,
            "avgtime": 0.000124342
        },
        "state_kv_commiting_lat": {
            "avgcount": 151295,
            "sum": 64.555436175,
            "avgtime": 0.000426685
        },
        "state_kv_done_lat": {
            "avgcount": 151295,
            "sum": 0.130403628,
            "avgtime": 0.000000861
        },
        "state_deferred_queued_lat": {
            "avgcount": 148,
            "sum": 215.726286547,
            "avgtime": 1.457610044
        },

... ...

        "op_w_latency": {
            "avgcount": 151133,
            "sum": 130.134246667,
            "avgtime": 0.000861057
        },
        "op_w_process_latency": {
            "avgcount": 151133,
            "sum": 125.301196872,
            "avgtime": 0.000829079
        },
        "op_w_prepare_latency": {
            "avgcount": 151133,
            "sum": 29.892687947,
            "avgtime": 0.000197790
        },

Is it reasonable for the benchmark test case?  And how to improve it?  
It's really NOT friendly for single thread.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx