Re: Hangup during scrubbing - possible solutions

Samuel Just <sam.just@xxxxxxxxxxx> · Fri, 30 Nov 2012 21:07:34 -0800



Just pushed a fix to next, 49f32cee647c5bd09f36ba7c9fd4f481a697b9d7.
Let me know if it persists.  Thanks for the logs!
-Sam

On Fri, Nov 30, 2012 at 2:04 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
> Hah!  Thanks for the log, it's our handling of active_pushes.  I'll
> have a patch shortly.
>
> Thanks!
> -Sam
>
> On Fri, Nov 30, 2012 at 4:14 AM, Andrey Korolyov <andrey@xxxxxxx> wrote:
>> http://xdel.ru/downloads/ceph-log/ceph-scrub-stuck.log.gz
>> http://xdel.ru/downloads/ceph-log/cluster-w.log.gz
>>
>> Here, please.
>>
>> I have initiated a deep-scrub of osd.1 which was lead to forever-stuck
>> I/O requests in a short time(scrub `ll do the same). Second log may be
>> useful for proper timestamps, as seeks on the original may took a long
>> time. Osd processes on the specific node was restarted twice - at the
>> beginning to be sure all config options were applied and at the end to
>> do same plus to get rid of stuck requests.
>>
>>
>> On Wed, Nov 28, 2012 at 5:35 AM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
>>> If you can reproduce it again, what we really need are the osd logs
>>> from the acting set of a pg stuck in scrub with
>>> debug osd = 20
>>> debug ms = 1
>>> debug filestore = 20.
>>>
>>> Thanks,
>>> -Sam
>>>
>>> On Sun, Nov 25, 2012 at 2:08 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
>>>> On Fri, Nov 23, 2012 at 12:35 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>>>> On Thu, 22 Nov 2012, Andrey Korolyov wrote:
>>>>>> Hi,
>>>>>>
>>>>>> In the recent versions Ceph introduces some unexpected behavior for
>>>>>> the permanent connections (VM or kernel clients) - after crash
>>>>>> recovery, I/O will hang on the next planned scrub on the following
>>>>>> scenario:
>>>>>>
>>>>>> - launch a bunch of clients doing non-intensive writes,
>>>>>> - lose one or more osd, mark them down, wait for recovery completion,
>>>>>> - do a slow scrub, e.g. scrubbing one osd per 5m, inside bash script,
>>>>>> or wait for ceph to do the same,
>>>>>> - observe a raising number of pgs stuck in the active+clean+scrubbing
>>>>>> state (they took a master role from ones which was on killed osd and
>>>>>> almost surely they are being written in time of crash),
>>>>>> - some time later, clients will hang hardly and ceph log introduce
>>>>>> stuck(old) I/O requests.
>>>>>>
>>>>>> The only one way to return clients back without losing their I/O state
>>>>>> is per-osd restart, which also will help to get rid of
>>>>>> active+clean+scrubbing pgs.
>>>>>>
>>>>>> First of all, I`ll be happy to help to solve this problem by providing
>>>>>> logs.
>>>>>
>>>>> If you can reproduce this behavior with 'debug osd = 20' and 'debug ms =
>>>>> 1' logging on the OSD, that would be wonderful!
>>>>>
>>>>
>>>> I have tested slightly different recovery flow, please see below.
>>>> Since there is no real harm, like frozen I/O, placement groups also
>>>> was stuck forever on the active+clean+scrubbing state, until I
>>>> restarted all osds (end of the log):
>>>>
>>>> http://xdel.ru/downloads/ceph-log/recover-clients-later-than-osd.txt.gz
>>>>
>>>> - start the healthy cluster
>>>> - start persistent clients
>>>> - add an another host with pair of OSDs, let them be in the data placement
>>>> - wait for data to rearrange
>>>> - [22:06 timestamp] mark OSDs out or simply kill them and wait(since I
>>>> have an 1/2 hour delay on readjust in such case, I did ``ceph osd
>>>> out'' manually)
>>>> - watch for data to rearrange again
>>>> - [22:51 timestamp] when it ends, start a manual rescrub, with
>>>> non-zero active+clean+scrubbing-state placement groups at the end of
>>>> process which `ll stay in this state forever until something happens
>>>>
>>>> After that, I can restart osds one per one, if I want to get rid of
>>>> scrubbing states immediately and then do deep-scrub(if I don`t, those
>>>> states will return at next ceph self-scrubbing) or do per-osd
>>>> deep-scrub, if I have a lot of time. The case I have described in the
>>>> previous message took place when I remove osd from data placement
>>>> which existed on the moment when client(s) have started and indeed it
>>>> is more harmful than current one(frozen I/O leads to hanging entire
>>>> guest, for example). Since testing those flow took a lot of time, I`ll
>>>> send logs related to this case tomorrow.
>>>>
>>>>>> Second question is not directly related to this problem, but I
>>>>>> have thought on for a long time - is there a planned features to
>>>>>> control scrub process more precisely, e.g. pg scrub rate or scheduled
>>>>>> scrub, instead of current set of timeouts which of course not very
>>>>>> predictable on when to run?
>>>>>
>>>>> Not yet.  I would be interested in hearing what kind of control/config
>>>>> options/whatever you (and others) would like to see!
>>>>
>>>> Of course it will be awesome to have any determined scheduler or at
>>>> least an option to disable automated scrubbing, since it is not very
>>>> determined in time and deep-scrub eating a lot of I/O if command
>>>> issued against entire OSD. Rate limiting is not in the first place, at
>>>> least it may be recreated in external script, but for those who prefer
>>>> to leave control to Ceph, it may be very useful.
>>>>
>>>> Thanks!
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html