On Tue, Feb 17, 2015 at 12:09 PM, Florian Haas <florian@xxxxxxxxxxx> wrote: > Hello everyone, > > I'm seeing some OSD behavior that I consider unexpected; perhaps > someone can shed some insight. > > Ceph giant (0.87.0), osd max backfills and osd recovery max active > both set to 1. > > Please take a moment to look at the following "ceph health detail" screen dump: > > HEALTH_WARN 14 pgs backfill; 1 pgs backfilling; 15 pgs stuck unclean; > recovery 16/65732491 objects degraded (0.000%); 328254/65732491 > objects misplaced (0.499%) > pg 20.3db is stuck unclean for 13547.432043, current state > active+remapped+wait_backfill, last acting [45,90,157] > pg 15.318 is stuck unclean for 13547.380581, current state > active+remapped+wait_backfill, last acting [41,17,120] > pg 15.34a is stuck unclean for 13548.115170, current state > active+remapped+wait_backfill, last acting [64,87,80] > pg 20.6f is stuck unclean for 13548.019218, current state > active+remapped+wait_backfill, last acting [13,38,98] > pg 20.44c is stuck unclean for 13548.075430, current state > active+remapped+wait_backfill, last acting [174,127,139] > pg 20.bc is stuck unclean for 13545.743397, current state > active+remapped+wait_backfill, last acting [72,64,104] > pg 15.1ac is stuck unclean for 13548.181461, current state > active+remapped+wait_backfill, last acting [121,145,84] > pg 15.1af is stuck unclean for 13547.962269, current state > active+remapped+backfilling, last acting [150,62,101] > pg 20.396 is stuck unclean for 13547.835109, current state > active+remapped+wait_backfill, last acting [134,49,96] > pg 15.1ba is stuck unclean for 13548.128752, current state > active+remapped+wait_backfill, last acting [122,63,162] > pg 15.3fd is stuck unclean for 13547.644431, current state > active+remapped+wait_backfill, last acting [156,38,131] > pg 20.41c is stuck unclean for 13548.133470, current state > active+remapped+wait_backfill, last acting [78,85,168] > pg 20.525 is stuck unclean for 13545.272774, current state > active+remapped+wait_backfill, last acting [76,57,148] > pg 15.1ca is stuck unclean for 13547.944928, current state > active+remapped+wait_backfill, last acting [157,19,36] > pg 20.11e is stuck unclean for 13545.368614, current state > active+remapped+wait_backfill, last acting [36,134,8] > pg 20.525 is active+remapped+wait_backfill, acting [76,57,148] > pg 20.44c is active+remapped+wait_backfill, acting [174,127,139] > pg 20.41c is active+remapped+wait_backfill, acting [78,85,168] > pg 15.3fd is active+remapped+wait_backfill, acting [156,38,131] > pg 20.3db is active+remapped+wait_backfill, acting [45,90,157] > pg 20.396 is active+remapped+wait_backfill, acting [134,49,96] > pg 15.34a is active+remapped+wait_backfill, acting [64,87,80] > pg 15.318 is active+remapped+wait_backfill, acting [41,17,120] > pg 15.1ca is active+remapped+wait_backfill, acting [157,19,36] > pg 15.1ba is active+remapped+wait_backfill, acting [122,63,162] > pg 15.1ac is active+remapped+wait_backfill, acting [121,145,84] > pg 15.1af is active+remapped+backfilling, acting [150,62,101] > pg 20.11e is active+remapped+wait_backfill, acting [36,134,8] > pg 20.bc is active+remapped+wait_backfill, acting [72,64,104] > pg 20.6f is active+remapped+wait_backfill, acting [13,38,98] > recovery 16/65732491 objects degraded (0.000%); 328254/65732491 > objects misplaced (0.499%) > > As you can see, there is barely any overlap between the acting OSDs > for those PGs. osd max backfills should only limit the number of > concurrent backfills out of a single OSD, and so in the situation > above I would expect the 15 backfills to happen mostly concurrently. > As it is they are being serialized, and that seems to needlessly slow > down the process and extend the time needed to complete recovery. > > I'm pretty sure I'm missing something obvious here, but what is it? The max backfill values cover both incoming and outgoing results. Presumably these are all waiting on a small set of target OSDs which are currently receiving backfills of some other PG. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com