Hi, I have a cluster in "stale" state : a lots of RBD are blocked since ~10 hours. In the status I see PG in stale or down state, but thoses PG doesn't seem to exists anymore : root! stor00-sbg:~# ceph health detail | egrep '(stale|down)' HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearfull osd(s); 16 pool(s) nearfull; 4645278/103969515 objects misplaced (4.468%); Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale; Degraded data redundancy: 2723173/103969515 objects degraded (2.619%), 387 pgs degraded, 297 pgs undersized; 229 slow requests are blocked > 32 sec; 4074 stuck requests are blocked > 4096 sec; too many PGs per OSD (202 > max 200); mons hyp01-sbg,hyp02-sbg,hyp03-sbg are using a lot of disk space PG_AVAILABILITY Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale pg 31.8b is down, acting [2147483647,16,36] pg 31.8e is down, acting [2147483647,29,19] pg 46.b8 is down, acting [2147483647,2147483647,13,17,47,28] root! stor00-sbg:~# ceph pg 31.8b query Error ENOENT: i don't have pgid 31.8b root! stor00-sbg:~# ceph pg 31.8e query Error ENOENT: i don't have pgid 31.8e root! stor00-sbg:~# ceph pg 46.b8 query Error ENOENT: i don't have pgid 46.b8 We just loose an HDD, and mark the corresponding OSD as "lost". Any idea of what should I do ? Thanks, Olivier _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com