Re: Slow requests during ceph osd boot

Andrey Korolyov <andrey@xxxxxxx> · Wed, 15 Jul 2015 12:52:40 +0300

On Wed, Jul 15, 2015 at 12:15 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> We have the same problems, we need to start the OSDs slowly.
> The problem seems to be CPU congestion. A booting OSD will use all available CPU power you give it, and if it doesn’t have enough nasty stuff happens (this might actually be the manifestation of some kind of problem in our setup as well).
> It doesn’t do that always - I was restarting our hosts this weekend and most of them came up fine with simple “service ceph start”, some just sat there spinning the CPU and not doing any real world (and the cluster was not very happy about that).
>
> Jan
>
>
>> On 15 Jul 2015, at 10:53, Kostis Fardelas <dante1234@xxxxxxxxx> wrote:
>>
>> Hello,
>> after some trial and error we concluded that if we start the 6 stopped
>> OSD daemons with a delay of 1 minute, we do not experience slow
>> requests (threshold is set on 30 sec), althrough there are some ops
>> that last up to 10s which is already high enough. I assume that if we
>> spread the delay more, the slow requests will vanish. The possibility
>> of not having tuned our setup to the most finest detail is not zeroed
>> out but I wonder if at any way we miss some ceph tuning in terms of
>> ceph configuration.
>>
>> We run firefly latest stable version.
>>
>> Regards,
>> Kostis
>>
>> On 13 July 2015 at 13:28, Kostis Fardelas <dante1234@xxxxxxxxx> wrote:
>>> Hello,
>>> after rebooting a ceph node and the OSDs starting booting and joining
>>> the cluster, we experience slow requests that get resolved immediately
>>> after cluster recovers. It is improtant to note that before the node
>>> reboot, we set noout flag in order to prevent recovery - so there are
>>> only degraded PGs when OSDs shut down- and let the cluster handle the
>>> OSDs down/up in the lightest way.
>>>
>>> Is there any tunable we should consider in order to avoid service
>>> degradation for our ceph clients?
>>>
>>> Regards,
>>> Kostis
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

As far as I`ve seen this problem, the main issue for regular
disk-backed OSDs is an IOPS starvation during some interval after
reading maps from filestore and marking itself as 'in' - even if
in-memory caches are still hot, I/O will significantly degrade for a
short period. The possible workaround for an otherwise healthy cluster
and node-wide restart is to set norecover flag, it would greatly
reduce a chance of hitting slow operations. Of course it is applicable
only to non-empty cluster with tens of percents of an average
utilization for rotating media. I pointed this issue a couple of years
ago first (it *does* break 30s I/O SLA for returning OSD, but
refilling same OSDs from scratch would not violate the same SLA,
giving out way bigger completion time for a refill). From UX side, it
would be great to introduce some kind of recovery throttler for newly
started OSDs, as recovery_ delay_start does not prevent immediate
recovery procedures.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com