Re: Unexpectedly low number of concurrent backfills

Florian Haas <florian@xxxxxxxxxxx> · Wed, 18 Feb 2015 17:45:34 +0100

On Wed, Feb 18, 2015 at 6:56 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Tue, Feb 17, 2015 at 9:48 PM, Florian Haas <florian@xxxxxxxxxxx> wrote:
>> On Tue, Feb 17, 2015 at 11:19 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>> On Tue, Feb 17, 2015 at 12:09 PM, Florian Haas <florian@xxxxxxxxxxx> wrote:
>>>> Hello everyone,
>>>>
>>>> I'm seeing some OSD behavior that I consider unexpected; perhaps
>>>> someone can shed some insight.
>>>>
>>>> Ceph giant (0.87.0), osd max backfills and osd recovery max active
>>>> both set to 1.
>>>>
>>>> Please take a moment to look at the following "ceph health detail" screen dump:
>>>>
>>>> HEALTH_WARN 14 pgs backfill; 1 pgs backfilling; 15 pgs stuck unclean;
>>>> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
>>>> objects misplaced (0.499%)
>>>> pg 20.3db is stuck unclean for 13547.432043, current state
>>>> active+remapped+wait_backfill, last acting [45,90,157]
>>>> pg 15.318 is stuck unclean for 13547.380581, current state
>>>> active+remapped+wait_backfill, last acting [41,17,120]
>>>> pg 15.34a is stuck unclean for 13548.115170, current state
>>>> active+remapped+wait_backfill, last acting [64,87,80]
>>>> pg 20.6f is stuck unclean for 13548.019218, current state
>>>> active+remapped+wait_backfill, last acting [13,38,98]
>>>> pg 20.44c is stuck unclean for 13548.075430, current state
>>>> active+remapped+wait_backfill, last acting [174,127,139]
>>>> pg 20.bc is stuck unclean for 13545.743397, current state
>>>> active+remapped+wait_backfill, last acting [72,64,104]
>>>> pg 15.1ac is stuck unclean for 13548.181461, current state
>>>> active+remapped+wait_backfill, last acting [121,145,84]
>>>> pg 15.1af is stuck unclean for 13547.962269, current state
>>>> active+remapped+backfilling, last acting [150,62,101]
>>>> pg 20.396 is stuck unclean for 13547.835109, current state
>>>> active+remapped+wait_backfill, last acting [134,49,96]
>>>> pg 15.1ba is stuck unclean for 13548.128752, current state
>>>> active+remapped+wait_backfill, last acting [122,63,162]
>>>> pg 15.3fd is stuck unclean for 13547.644431, current state
>>>> active+remapped+wait_backfill, last acting [156,38,131]
>>>> pg 20.41c is stuck unclean for 13548.133470, current state
>>>> active+remapped+wait_backfill, last acting [78,85,168]
>>>> pg 20.525 is stuck unclean for 13545.272774, current state
>>>> active+remapped+wait_backfill, last acting [76,57,148]
>>>> pg 15.1ca is stuck unclean for 13547.944928, current state
>>>> active+remapped+wait_backfill, last acting [157,19,36]
>>>> pg 20.11e is stuck unclean for 13545.368614, current state
>>>> active+remapped+wait_backfill, last acting [36,134,8]
>>>> pg 20.525 is active+remapped+wait_backfill, acting [76,57,148]
>>>> pg 20.44c is active+remapped+wait_backfill, acting [174,127,139]
>>>> pg 20.41c is active+remapped+wait_backfill, acting [78,85,168]
>>>> pg 15.3fd is active+remapped+wait_backfill, acting [156,38,131]
>>>> pg 20.3db is active+remapped+wait_backfill, acting [45,90,157]
>>>> pg 20.396 is active+remapped+wait_backfill, acting [134,49,96]
>>>> pg 15.34a is active+remapped+wait_backfill, acting [64,87,80]
>>>> pg 15.318 is active+remapped+wait_backfill, acting [41,17,120]
>>>> pg 15.1ca is active+remapped+wait_backfill, acting [157,19,36]
>>>> pg 15.1ba is active+remapped+wait_backfill, acting [122,63,162]
>>>> pg 15.1ac is active+remapped+wait_backfill, acting [121,145,84]
>>>> pg 15.1af is active+remapped+backfilling, acting [150,62,101]
>>>> pg 20.11e is active+remapped+wait_backfill, acting [36,134,8]
>>>> pg 20.bc is active+remapped+wait_backfill, acting [72,64,104]
>>>> pg 20.6f is active+remapped+wait_backfill, acting [13,38,98]
>>>> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
>>>> objects misplaced (0.499%)
>>>>
>>>> As you can see, there is barely any overlap between the acting OSDs
>>>> for those PGs. osd max backfills should only limit the number of
>>>> concurrent backfills out of a single OSD, and so in the situation
>>>> above I would expect the 15 backfills to happen mostly concurrently.
>>>> As it is they are being serialized, and that seems to needlessly slow
>>>> down the process and extend the time needed to complete recovery.
>>>>
>>>> I'm pretty sure I'm missing something obvious here, but what is it?
>>>
>>> The max backfill values cover both incoming and outgoing results.
>>> Presumably these are all waiting on a small set of target OSDs which
>>> are currently receiving backfills of some other PG.
>>
>> Thanks for the reply, and I am aware of that, but I am not sure how it
>> applies here.
>>
>> What I quoted was the complete list of then-current backfills in the
>> cluster. Those are *all* the PGs affected by backfills. And they're so
>> scattered across OSDs that there is barely any overlap. The only OSDs
>> I even see listed twice are 38 and 64, which would affect PGs
>> 15.3fd/20.6f 15.34a/20.bc. What is causing the others to wait?
>>
>> Or am I misunderstanding the "acting" value here and some other OSDs
>> are involved, and if so, how would I find out what those are?
>
> Yes, unless I'm misremembering. Look at the pg dump for those PGs and
> check out the "up" versus "acting" values. The "acting" ones are what
> the PG is currently remapped to; they're waiting to backfill onto the
> proper set of "up" OSDs.

I did run a ceph pg query on one of them and could have sworn that the
"up" and "acting" sets were identical. Stupidly I neglected to save a
screen dump so I can't double-check now, though. I hope I'll be able
to reproduce this down the road.

Cheers,
Florian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com