Re: Ceph pgs stuck or degraded.

Sage Weil <sage@xxxxxxxxxxx> · Mon, 22 Jul 2013 11:27:26 -0700 (PDT)

On Mon, 22 Jul 2013, Gaylord Holder wrote:
> 
> I have a 12 OSD/3 host set up, and have be stuck with a bunch of stuck pages.
> 
> I've verified the OSDs are all up and in.  The crushmap looks fine.
> I've tried restarting all the daemons.
> 
> 
> 
> root@never:/var/lib/ceph/mon# ceph status
>    health HEALTH_WARN 139 pgs degraded; 461 pgs stuck unclean; recovery
> 216/6213 degraded (3.477%)
>    monmap e4: 2 mons at {a=192.168.225.9:6789/0,b=192.168.225.10:6789/0},
> election epoch 14, quorum 0,1 a,b

Add another monitor; right now if 1 fails the cluster is unavailable.

>    osdmap e238: 12 osds: 12 up, 12 in
>     pgmap v7396: 2528 pgs: 2067 active+clean, 322 active+remapped, 139
> active+degraded; 8218 MB data, 103 GB used, 22241 GB / 22345 GB avail;
> 216/6213 degraded (3.477%)
>    mdsmap e1: 0/0/1 up

My guess crush tunables.  Try

 ceph osd crush tunables optimal

unless you are using a pre-3.8(ish) kernel or other very old (pre-bobtail) 
clients.

sage

> 
> 
> I have one non-default pool with 3x replication.  Fewer than half of the pg
> have expanded to 3x (278/400 pgs still have acting 2x sets).
> 
> Where can I go look for the trouble?
> 
> Thank you for any light someone can shed on this.
> 
> Cheers,
> -Gaylord
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com