Re: pg's degraded

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Thu, 20 Nov 2014 11:18:53 -0800

Just to be clear, this is from a cluster that was healthy, had a disk replaced, and hasn't returned to healthy?  It's not a new cluster that has never been healthy, right?
Assuming it's an existing cluster, how many OSDs did you replace?  It almost looks like you replaced multiple OSDs at the same time, and lost data because of it.

Can you give us the output of `ceph osd tree`, and `ceph pg 2.33 query`?

On Wed, Nov 19, 2014 at 2:14 PM, JIten Shah <jshah2005@xxxxxx> wrote:
After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded mode. Sone are in the unclean and others are in the stale state. Somehow the MDS is also degraded. How do I recover the OSD’s and the MDS back to healthy ? Read through the documentation and on the web but no luck so far.
pg 2.33 is stuck unclean since forever, current state stale+active+degraded+remapped, last acting [3]
pg 0.30 is stuck unclean since forever, current state stale+active+degraded+remapped, last acting [3]
pg 1.31 is stuck unclean since forever, current state stale+active+degraded, last acting [2]
pg 2.32 is stuck unclean for 597129.903922, current state stale+active+degraded, last acting [2]
pg 0.2f is stuck unclean for 597129.903951, current state stale+active+degraded, last acting [2]
pg 1.2e is stuck unclean since forever, current state stale+active+degraded+remapped, last acting [3]
pg 2.2d is stuck unclean since forever, current state stale+active+degraded+remapped, last acting [2]
pg 0.2e is stuck unclean since forever, current state stale+active+degraded+remapped, last acting [3]
pg 1.2f is stuck unclean for 597129.904015, current state stale+active+degraded, last acting [2]
pg 2.2c is stuck unclean since forever, current state stale+active+degraded+remapped, last acting [3]
pg 0.2d is stuck stale for 422844.566858, current state stale+active+degraded, last acting [2]
pg 1.2c is stuck stale for 422598.539483, current state stale+active+degraded+remapped, last acting [3]
pg 2.2f is stuck stale for 422598.539488, current state stale+active+degraded+remapped, last acting [3]
pg 0.2c is stuck stale for 422598.539487, current state stale+active+degraded+remapped, last acting [3]
pg 1.2d is stuck stale for 422598.539492, current state stale+active+degraded+remapped, last acting [3]
pg 2.2e is stuck stale for 422598.539496, current state stale+active+degraded+remapped, last acting [3]
pg 0.2b is stuck stale for 422598.539491, current state stale+active+degraded+remapped, last acting [3]
pg 1.2a is stuck stale for 422598.539496, current state stale+active+degraded+remapped, last acting [3]
pg 2.29 is stuck stale for 422598.539504, current state stale+active+degraded+remapped, last acting [3]
.
.
.
6 ops are blocked > 2097.15 sec
3 ops are blocked > 2097.15 sec on osd.0
2 ops are blocked > 2097.15 sec on osd.2
1 ops are blocked > 2097.15 sec on osd.4
3 osds have slow requests
recovery 40/60 objects degraded (66.667%)
mds cluster is degraded
mds.Lab-cephmon001 at X.X.16.111:6800/3424727 rank 0 is replaying journal

—Jiten

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com