Re: Unexpectedly low number of concurrent backfills

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 17 Feb 2015 21:56:38 -0800



On Tue, Feb 17, 2015 at 9:48 PM, Florian Haas <florian@xxxxxxxxxxx> wrote:
> On Tue, Feb 17, 2015 at 11:19 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> On Tue, Feb 17, 2015 at 12:09 PM, Florian Haas <florian@xxxxxxxxxxx> wrote:
>>> Hello everyone,
>>>
>>> I'm seeing some OSD behavior that I consider unexpected; perhaps
>>> someone can shed some insight.
>>>
>>> Ceph giant (0.87.0), osd max backfills and osd recovery max active
>>> both set to 1.
>>>
>>> Please take a moment to look at the following "ceph health detail" screen dump:
>>>
>>> HEALTH_WARN 14 pgs backfill; 1 pgs backfilling; 15 pgs stuck unclean;
>>> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
>>> objects misplaced (0.499%)
>>> pg 20.3db is stuck unclean for 13547.432043, current state
>>> active+remapped+wait_backfill, last acting [45,90,157]
>>> pg 15.318 is stuck unclean for 13547.380581, current state
>>> active+remapped+wait_backfill, last acting [41,17,120]
>>> pg 15.34a is stuck unclean for 13548.115170, current state
>>> active+remapped+wait_backfill, last acting [64,87,80]
>>> pg 20.6f is stuck unclean for 13548.019218, current state
>>> active+remapped+wait_backfill, last acting [13,38,98]
>>> pg 20.44c is stuck unclean for 13548.075430, current state
>>> active+remapped+wait_backfill, last acting [174,127,139]
>>> pg 20.bc is stuck unclean for 13545.743397, current state
>>> active+remapped+wait_backfill, last acting [72,64,104]
>>> pg 15.1ac is stuck unclean for 13548.181461, current state
>>> active+remapped+wait_backfill, last acting [121,145,84]
>>> pg 15.1af is stuck unclean for 13547.962269, current state
>>> active+remapped+backfilling, last acting [150,62,101]
>>> pg 20.396 is stuck unclean for 13547.835109, current state
>>> active+remapped+wait_backfill, last acting [134,49,96]
>>> pg 15.1ba is stuck unclean for 13548.128752, current state
>>> active+remapped+wait_backfill, last acting [122,63,162]
>>> pg 15.3fd is stuck unclean for 13547.644431, current state
>>> active+remapped+wait_backfill, last acting [156,38,131]
>>> pg 20.41c is stuck unclean for 13548.133470, current state
>>> active+remapped+wait_backfill, last acting [78,85,168]
>>> pg 20.525 is stuck unclean for 13545.272774, current state
>>> active+remapped+wait_backfill, last acting [76,57,148]
>>> pg 15.1ca is stuck unclean for 13547.944928, current state
>>> active+remapped+wait_backfill, last acting [157,19,36]
>>> pg 20.11e is stuck unclean for 13545.368614, current state
>>> active+remapped+wait_backfill, last acting [36,134,8]
>>> pg 20.525 is active+remapped+wait_backfill, acting [76,57,148]
>>> pg 20.44c is active+remapped+wait_backfill, acting [174,127,139]
>>> pg 20.41c is active+remapped+wait_backfill, acting [78,85,168]
>>> pg 15.3fd is active+remapped+wait_backfill, acting [156,38,131]
>>> pg 20.3db is active+remapped+wait_backfill, acting [45,90,157]
>>> pg 20.396 is active+remapped+wait_backfill, acting [134,49,96]
>>> pg 15.34a is active+remapped+wait_backfill, acting [64,87,80]
>>> pg 15.318 is active+remapped+wait_backfill, acting [41,17,120]
>>> pg 15.1ca is active+remapped+wait_backfill, acting [157,19,36]
>>> pg 15.1ba is active+remapped+wait_backfill, acting [122,63,162]
>>> pg 15.1ac is active+remapped+wait_backfill, acting [121,145,84]
>>> pg 15.1af is active+remapped+backfilling, acting [150,62,101]
>>> pg 20.11e is active+remapped+wait_backfill, acting [36,134,8]
>>> pg 20.bc is active+remapped+wait_backfill, acting [72,64,104]
>>> pg 20.6f is active+remapped+wait_backfill, acting [13,38,98]
>>> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
>>> objects misplaced (0.499%)
>>>
>>> As you can see, there is barely any overlap between the acting OSDs
>>> for those PGs. osd max backfills should only limit the number of
>>> concurrent backfills out of a single OSD, and so in the situation
>>> above I would expect the 15 backfills to happen mostly concurrently.
>>> As it is they are being serialized, and that seems to needlessly slow
>>> down the process and extend the time needed to complete recovery.
>>>
>>> I'm pretty sure I'm missing something obvious here, but what is it?
>>
>> The max backfill values cover both incoming and outgoing results.
>> Presumably these are all waiting on a small set of target OSDs which
>> are currently receiving backfills of some other PG.
>
> Thanks for the reply, and I am aware of that, but I am not sure how it
> applies here.
>
> What I quoted was the complete list of then-current backfills in the
> cluster. Those are *all* the PGs affected by backfills. And they're so
> scattered across OSDs that there is barely any overlap. The only OSDs
> I even see listed twice are 38 and 64, which would affect PGs
> 15.3fd/20.6f 15.34a/20.bc. What is causing the others to wait?
>
> Or am I misunderstanding the "acting" value here and some other OSDs
> are involved, and if so, how would I find out what those are?

Yes, unless I'm misremembering. Look at the pg dump for those PGs and
check out the "up" versus "acting" values. The "acting" ones are what
the PG is currently remapped to; they're waiting to backfill onto the
proper set of "up" OSDs.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com