On 1/30/19 9:08 PM, David Zafman wrote: > > Strange, I can't reproduce this with v13.2.4. I tried the following > scenarios: > > pg acting 1, 0, 2 -> up 1, 0 4 (osd.2 marked out). The df on osd.2 > shows 0 space, but only osd.4 (backfill target) checks full space. > > pg acting 1, 0, 2 -> up 4,3,5 (osd,1,0,2 all marked out). The df for > 1,0,2 show 0 space but osd.4,3,4 (backafill targets) check full space. > > FYI, In a later release even when a backfill target is below > backfillfull_ratio, if there isn't enough room for the pg to fit then > backfill_toofull occurs. > > > The question in your case is was any of OSDs 999, 1900, or 145 above > 90% (backfillfull_ratio) usage. I triple-checked and this was not the case. I've had two Instances of Mimic 13.2.4 where I ran into this and had somebody else report it to me. In a few weeks I'll be performing an expansion with a customer where I'm expecting this to show up again. I'll check again and note the use on all OSDs and report back. Wido > > David > > On 1/27/19 11:34 PM, Wido den Hollander wrote: >> >> On 1/25/19 8:33 AM, Gregory Farnum wrote: >>> This doesn’t look familiar to me. Is the cluster still doing recovery so >>> we can at least expect them to make progress when the “out” OSDs get >>> removed from the set? >> The recovery has already finished. It resolves itself, but in the >> meantime I saw many PGs in the backfill_toofull state for a long time. >> >> This is new since Mimic. >> >> Wido >> >>> On Tue, Jan 22, 2019 at 2:44 PM Wido den Hollander <wido@xxxxxxxx >>> <mailto:wido@xxxxxxxx>> wrote: >>> >>> Hi, >>> >>> I've got a couple of PGs which are stuck in backfill_toofull, >>> but none >>> of them are actually full. >>> >>> "up": [ >>> 999, >>> 1900, >>> 145 >>> ], >>> "acting": [ >>> 701, >>> 1146, >>> 1880 >>> ], >>> "backfill_targets": [ >>> "145", >>> "999", >>> "1900" >>> ], >>> "acting_recovery_backfill": [ >>> "145", >>> "701", >>> "999", >>> "1146", >>> "1880", >>> "1900" >>> ], >>> >>> I checked all these OSDs, but they are all <75% utilization. >>> >>> full_ratio 0.95 >>> backfillfull_ratio 0.9 >>> nearfull_ratio 0.9 >>> >>> So I started checking all the PGs and I've noticed that each of >>> these >>> PGs has one OSD in the 'acting_recovery_backfill' which is >>> marked as >>> out. >>> >>> In this case osd.1880 is marked as out and thus it's capacity is >>> shown >>> as zero. >>> >>> [ceph@ceph-mgr ~]$ ceph osd df|grep 1880 >>> 1880 hdd 4.54599 0 0 B 0 B 0 B 0 >>> 0 27 >>> [ceph@ceph-mgr ~]$ >>> >>> This is on a Mimic 13.2.4 cluster. Is this expected or is this a >>> unknown >>> side-effect of one of the OSDs being marked as out? >>> >>> Thanks, >>> >>> Wido >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com