Fwd: 1 active+undersized+degraded+remapped+wait_backfill+backfill_toofull ???

Vincent Godin <vince.mlist@xxxxxxxxx> · Mon, 25 Jul 2016 17:39:43 +0200

The OSD 140 is 73.61% used and its backfill_full_ratio is 0.85 too

---------- Forwarded message ----------
From: Vincent Godin <vince.mlist@xxxxxxxxx>
Date: 2016-07-25 17:35 GMT+02:00
Subject: 1 active+undersized+degraded+remapped+wait_backfill+backfill_toofull ???
To: ceph-users@xxxxxxxxxxxxxx

Hi,

I'm facing this problem. The cluster is in Hammer 0.94.5

When i do a ceph health detail, i can see :

pg 8.c1 is stuck unclean for 21691.555742, current state active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last acting [140]
pg 8.c1 is stuck undersized for 21327.027365, current state active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last acting [140]
pg 8.c1 is stuck degraded for 21327.035186, current state active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last acting [140]
pg 8.c1 is active+undersized+degraded+remapped+wait_backfill+backfill_toofull, acting [140]

If I query the pg 8.c1, the recovery section gives :

 "recovery_state": [
        {
            "name": "Started\/Primary\/Active",
            "enter_time": "2016-07-25 11:24:48.658971",
            "might_have_unfound": [],
            "recovery_progress": {
                "backfill_targets": [
                    "80",
                    "151"
                ],
                "waiting_on_backfill": [],
                "last_backfill_started": "0\/\/0\/\/-1",
                "backfill_info": {
                    "begin": "0\/\/0\/\/-1",
                    "end": "0\/\/0\/\/-1",
                    "objects": []
                },
                "peer_backfill_info": [],
                "backfills_in_flight": [],
                "recovering": [],
                "pg_backend": {
                    "pull_from_peer": [],
                    "pushing": []
                }
            },

The problem is when i do a "ceph osd df" on osd 80 or 151, i can see that osd 80 is used at 16.19% and osd 151 at 81.98% (osd80 to osd99 are recovering so that's why they are so empty)

If i look for the backfill_full_ration of these two osds, i found 0.85 for both of them
So why the pg is in backfill_toofull state ?
Thanks for your help
Vincent

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com