ghost PG : "i don't have pgid xx"

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Tue, 05 Jun 2018 09:25:49 +0200

Hi,

I have a cluster in "stale" state : a lots of RBD are blocked since ~10
hours. In the status I see PG in stale or down state, but thoses PG
doesn't seem to exists anymore :

root! stor00-sbg:~# ceph health detail | egrep '(stale|down)'
HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearfull osd(s); 16 pool(s) nearfull; 4645278/103969515 objects misplaced (4.468%); Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale; Degraded data redundancy: 2723173/103969515 objects degraded (2.619%), 387 pgs degraded, 297 pgs undersized; 229 slow requests are blocked > 32 sec; 4074 stuck requests are blocked > 4096 sec; too many PGs per OSD (202 > max 200); mons hyp01-sbg,hyp02-sbg,hyp03-sbg are using a lot of disk space
PG_AVAILABILITY Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale
    pg 31.8b is down, acting [2147483647,16,36]
    pg 31.8e is down, acting [2147483647,29,19]
    pg 46.b8 is down, acting [2147483647,2147483647,13,17,47,28]

root! stor00-sbg:~# ceph pg 31.8b query
Error ENOENT: i don't have pgid 31.8b

root! stor00-sbg:~# ceph pg 31.8e query
Error ENOENT: i don't have pgid 31.8e

root! stor00-sbg:~# ceph pg 46.b8 query
Error ENOENT: i don't have pgid 46.b8

We just loose an HDD, and mark the corresponding OSD as "lost".

Any idea of what should I do ?

Thanks,

Olivier
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com