Re: backfill_toofull while OSDs are not full

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Strange, I can't reproduce this with v13.2.4.  I tried the following scenarios:

pg acting 1, 0, 2 -> up 1, 0 4 (osd.2 marked out).  The df on osd.2 shows 0 space, but only osd.4 (backfill target) checks full space.

pg acting 1, 0, 2 -> up 4,3,5 (osd,1,0,2 all marked out).  The df for 1,0,2 show 0 space but osd.4,3,4 (backafill targets) check full space.

FYI, In a later release even when a backfill target is below backfillfull_ratio, if there isn't enough room for the pg to fit then backfill_toofull occurs.


The question in your case is was any of  OSDs 999, 1900, or 145 above 90% (backfillfull_ratio) usage.

David

On 1/27/19 11:34 PM, Wido den Hollander wrote:

On 1/25/19 8:33 AM, Gregory Farnum wrote:
This doesn’t look familiar to me. Is the cluster still doing recovery so
we can at least expect them to make progress when the “out” OSDs get
removed from the set?
The recovery has already finished. It resolves itself, but in the
meantime I saw many PGs in the backfill_toofull state for a long time.

This is new since Mimic.

Wido

On Tue, Jan 22, 2019 at 2:44 PM Wido den Hollander <wido@xxxxxxxx
<mailto:wido@xxxxxxxx>> wrote:

     Hi,

     I've got a couple of PGs which are stuck in backfill_toofull, but none
     of them are actually full.

       "up": [
         999,
         1900,
         145
       ],
       "acting": [
         701,
         1146,
         1880
       ],
       "backfill_targets": [
         "145",
         "999",
         "1900"
       ],
       "acting_recovery_backfill": [
         "145",
         "701",
         "999",
         "1146",
         "1880",
         "1900"
       ],

     I checked all these OSDs, but they are all <75% utilization.

     full_ratio 0.95
     backfillfull_ratio 0.9
     nearfull_ratio 0.9

     So I started checking all the PGs and I've noticed that each of these
     PGs has one OSD in the 'acting_recovery_backfill' which is marked as
     out.

     In this case osd.1880 is marked as out and thus it's capacity is shown
     as zero.

     [ceph@ceph-mgr ~]$ ceph osd df|grep 1880
     1880   hdd 4.54599        0     0 B      0 B      0 B     0    0  27
     [ceph@ceph-mgr ~]$

     This is on a Mimic 13.2.4 cluster. Is this expected or is this a unknown
     side-effect of one of the OSDs being marked as out?

     Thanks,

     Wido
     _______________________________________________
     ceph-users mailing list
     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux