On 11/15/16 14:05, Thomas Danan wrote: > Hi Peter, > > Ceph cluster version is 0.94.5 and we are running with Firefly tunables and also we have 10KPGs instead of the 30K / 40K we should have. > The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2 > > On our side we havethe following settings: > mon_osd_adjust_heartbeat_grace = false > mon_osd_adjust_down_out_interval = false > mon_osd_min_down_reporters = 5 > mon_osd_min_down_reports = 10 > > explaining why the OSDs are not flapping but still they are behaving wrongly and generate the slow requests I am describing. > > The osd_op_complaint_time is with the default value (30 sec), not sure I want to change it base on your experience I wasn't saying you should set the complaint time to 5, just saying that's why I have complaints logged with such low block times. > Thomas And now I'm testing this: osd recovery sleep = 0.5 osd snap trim sleep = 0.5 (or fiddling with it as low as 0.1 to make it rebalance faster) While also changing tunables to optimal (which will rebalance 75% of the objects) Which has very good results so far (a few <14s blocks right at the start, and none since, over an hour ago). And I'm somehow hoping that will fix my rbd export-diff issue too... but it at least appears to fix the rebalance causing blocks. Do you use rbd snapshots? I think that may be causing my issues, based on things like: > "description": "osd_op(client.692201.0:20455419 4.1b5a5bc1 > rbd_data.94a08238e1f29.000000000000617b [] snapc 918d=[918d] > ack+ondisk+write+known_if_redirected e40036)", > "initiated_at": "2016-11-15 20:57:48.313432", > "age": 409.634862, > "duration": 3.377347, > ... > { > "time": "2016-11-15 20:57:48.313767", > "event": "waiting for subops from 0,1,8,22" > }, > ... > { > "time": "2016-11-15 20:57:51.688530", > "event": "sub_op_applied_rec from 22" > }, Which says "snapc" in there (CoW?), and I think shows that just one osd is delayed a few seconds and the rest are really fast, like you said. (and not sure why I see 4 osds here when I have size 3... node1 osd 0 and 1, and node3 osd 8 and 22) or some (shorter I think) have description like: > osd_repop(client.426591.0:203051290 4.1f9 > 4:9fe4c001:::rbd_data.4cf92238e1f29.00000000000014ef:head v 40047'2531604) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com