Re: Large latency for single thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Norman,


Persistent client side cache potentially may help in this case if you are ok with the trade-offs.  It's been a while since I've seen any benchmarks with it so you may need to do some testing yourself.


Crimson is not ready for production use at this time so I would focus on the existing OSD with bluestore.


Mark


On 12/21/21 8:40 PM, norman.kern wrote:
Mark,

Thanks for your reply. I made the test on the local host and no replica pg set.  The crimson may help me a lot and I will do more tests.

And I will try rbd persistent cache feature for that the client is sensitive to latency.

P.S. crimson can be used in production now or not ?

On 12/16/21 3:53 AM, Mark Nelson wrote:
FWIW, we ran single OSD, iodepth=1 O_DSYNC write tests against classic and crimson bluestore OSDs in our Q3 crimson slide deck. You can see the results starting on slide 32 here:


https://docs.google.com/presentation/d/1eydyAFKRea8n-VniQzXKW8qkKM9GLVMJt2uDjipJjQA/edit#slide=id.gf880cf6296_1_73


That was with the OSD restricted to 2 cores, but for these tests it shouldn't really matter.  Also keep in mind that the fio client was on localhost as well.  Note that Crimson is less efficient than the classic OSD in this test (while being more efficient in other tests) because the reactor is working in a tight loop to reduce latency and since the OSD isn't doing a ton of IO that ends up dominating in terms of CPU usage.  Seastar provides an option to have the reactor be a bit more lazy that lowers idle CPU consumption but we don't utilize it yet.


Running with replication across mulitple OSDs (that requires round trips to mulitple replicas) does make this tougher to do well on a real cluster.  I suspect that long term crimson should be better at this kind of workload vs classic, but with synchronous replication we're always going to be fighting against the slowest link.


Mark

On 12/15/21 12:44 PM, Marc wrote:
Is this not just inherent to SDS? And wait for the new osd code, I think they are working on it.

https://yourcmc.ru/wiki/Ceph_performance


m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15
14:05:32 2021
    write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone
resets
      slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
      clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
       lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
      clat percentiles (usec):
       |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824],
20.00th=[  906],
       | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[
1303],
       | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[
1663],
       | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[
3949],
       | 99.99th=[ 6718]
     bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54,
stdev=588.79, samples=360
     iops        : min=  482, max= 1262, avg=794.76, stdev=147.20,
samples=360
    lat (usec)   : 750=2.98%, 1000=22.41%
    lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
    cpu          : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
  >=64=0.0%
       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
       issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
       latency   : target=0, window=0, percentile=100.00%, depth=1


Parts of the OSD' perf status:

       "state_io_done_lat": {
              "avgcount": 151295,
              "sum": 0.336297058,
              "avgtime": 0.000002222
          },
          "state_kv_queued_lat": {
              "avgcount": 151295,
              "sum": 18.812333051,
              "avgtime": 0.000124342
          },
          "state_kv_commiting_lat": {
              "avgcount": 151295,
              "sum": 64.555436175,
              "avgtime": 0.000426685
          },
          "state_kv_done_lat": {
              "avgcount": 151295,
              "sum": 0.130403628,
              "avgtime": 0.000000861
          },
          "state_deferred_queued_lat": {
              "avgcount": 148,
              "sum": 215.726286547,
              "avgtime": 1.457610044
          },

... ...

          "op_w_latency": {
              "avgcount": 151133,
              "sum": 130.134246667,
              "avgtime": 0.000861057
          },
          "op_w_process_latency": {
              "avgcount": 151133,
              "sum": 125.301196872,
              "avgtime": 0.000829079
          },
          "op_w_prepare_latency": {
              "avgcount": 151133,
              "sum": 29.892687947,
              "avgtime": 0.000197790
          },

Is it reasonable for the benchmark test case?  And how to improve it?
It's really NOT friendly for single thread.


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux