Re: Large latency for single thread

"norman.kern" <norman.kern@xxxxxxx> · Wed, 22 Dec 2021 10:40:07 +0800

Mark,

Thanks for your reply. I made the test on the local host and no replica 
pg set.  The crimson may help me a lot and I will do more tests.

And I will try rbd persistent cache feature for that the client is 
sensitive to latency.

P.S. crimson can be used in production now or not ?

On 12/16/21 3:53 AM, Mark Nelson wrote:
FWIW, we ran single OSD, iodepth=1 O_DSYNC write tests against classic 
and crimson bluestore OSDs in our Q3 crimson slide deck. You can see 
the results starting on slide 32 here:

https://docs.google.com/presentation/d/1eydyAFKRea8n-VniQzXKW8qkKM9GLVMJt2uDjipJjQA/edit#slide=id.gf880cf6296_1_73 

That was with the OSD restricted to 2 cores, but for these tests it 
shouldn't really matter.  Also keep in mind that the fio client was on 
localhost as well.  Note that Crimson is less efficient than the 
classic OSD in this test (while being more efficient in other tests) 
because the reactor is working in a tight loop to reduce latency and 
since the OSD isn't doing a ton of IO that ends up dominating in terms 
of CPU usage.  Seastar provides an option to have the reactor be a bit 
more lazy that lowers idle CPU consumption but we don't utilize it yet.

Running with replication across mulitple OSDs (that requires round 
trips to mulitple replicas) does make this tougher to do well on a 
real cluster.  I suspect that long term crimson should be better at 
this kind of workload vs classic, but with synchronous replication 
we're always going to be fighting against the slowest link.

Mark

On 12/15/21 12:44 PM, Marc wrote:
Is this not just inherent to SDS? And wait for the new osd code, I 
think they are working on it.

https://yourcmc.ru/wiki/Ceph_performance

m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15
14:05:32 2021
    write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone
resets
      slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
      clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
       lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
      clat percentiles (usec):
       |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824],
20.00th=[  906],
       | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[
1303],
       | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[
1663],
       | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[
3949],
       | 99.99th=[ 6718]
     bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54,
stdev=588.79, samples=360
     iops        : min=  482, max= 1262, avg=794.76, stdev=147.20,
samples=360
    lat (usec)   : 750=2.98%, 1000=22.41%
    lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
    cpu          : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
  >=64=0.0%
       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
       issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
       latency   : target=0, window=0, percentile=100.00%, depth=1

Parts of the OSD' perf status:

       "state_io_done_lat": {
              "avgcount": 151295,
              "sum": 0.336297058,
              "avgtime": 0.000002222
          },
          "state_kv_queued_lat": {
              "avgcount": 151295,
              "sum": 18.812333051,
              "avgtime": 0.000124342
          },
          "state_kv_commiting_lat": {
              "avgcount": 151295,
              "sum": 64.555436175,
              "avgtime": 0.000426685
          },
          "state_kv_done_lat": {
              "avgcount": 151295,
              "sum": 0.130403628,
              "avgtime": 0.000000861
          },
          "state_deferred_queued_lat": {
              "avgcount": 148,
              "sum": 215.726286547,
              "avgtime": 1.457610044
          },

... ...

          "op_w_latency": {
              "avgcount": 151133,
              "sum": 130.134246667,
              "avgtime": 0.000861057
          },
          "op_w_process_latency": {
              "avgcount": 151133,
              "sum": 125.301196872,
              "avgtime": 0.000829079
          },
          "op_w_prepare_latency": {
              "avgcount": 151133,
              "sum": 29.892687947,
              "avgtime": 0.000197790
          },

Is it reasonable for the benchmark test case?  And how to improve it?
It's really NOT friendly for single thread.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx