Re: ceph cluster having blocke requests very frequently

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Tue, 15 Nov 2016 21:27:15 +0100

On 11/15/16 14:05, Thomas Danan wrote:
> Hi Peter,
>
> Ceph cluster version is 0.94.5 and we are running with Firefly tunables and also we have 10KPGs instead of the 30K / 40K we should have.
> The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2
>
> On our side we havethe following settings:
> mon_osd_adjust_heartbeat_grace = false
> mon_osd_adjust_down_out_interval = false
> mon_osd_min_down_reporters = 5
> mon_osd_min_down_reports = 10
>
> explaining why the OSDs are not flapping but still they are behaving wrongly and generate the slow requests I am describing.
>
> The osd_op_complaint_time is with the default value (30 sec), not sure I want to change it base on your experience
I wasn't saying you should set the complaint time to 5, just saying
that's why I have complaints logged with such low block times.
> Thomas

And now I'm testing this:
        osd recovery sleep = 0.5
        osd snap trim sleep = 0.5

(or fiddling with it as low as 0.1 to make it rebalance faster)

While also changing tunables to optimal (which will rebalance 75% of the
objects)
Which has very good results so far (a few <14s blocks right at the
start, and none since, over an hour ago).

And I'm somehow hoping that will fix my rbd export-diff issue too... but
it at least appears to fix the rebalance causing blocks.

Do you use rbd snapshots? I think that may be causing my issues, based
on things like:

>             "description": "osd_op(client.692201.0:20455419 4.1b5a5bc1
> rbd_data.94a08238e1f29.000000000000617b [] snapc 918d=[918d]
> ack+ondisk+write+known_if_redirected e40036)",
>             "initiated_at": "2016-11-15 20:57:48.313432",
>             "age": 409.634862,
>             "duration": 3.377347,
>             ...
>                     {
>                         "time": "2016-11-15 20:57:48.313767",
>                         "event": "waiting for subops from 0,1,8,22"
>                     },
>             ...
>                     {
>                         "time": "2016-11-15 20:57:51.688530",
>                         "event": "sub_op_applied_rec from 22"
>                     },

Which says "snapc" in there (CoW?), and I think shows that just one osd
is delayed a few seconds and the rest are really fast, like you said.
(and not sure why I see 4 osds here when I have size 3... node1 osd 0
and 1, and node3 osd 8 and 22)

or some (shorter I think) have description like:
> osd_repop(client.426591.0:203051290 4.1f9
> 4:9fe4c001:::rbd_data.4cf92238e1f29.00000000000014ef:head v 40047'2531604)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com