Re: Ceph pgs stuck or degraded.

Gaylord Holder <gholder@xxxxxxxxxxxxx> · Mon, 22 Jul 2013 16:44:06 -0400

Sage,

The crush tunables did the trick.

why?  Could you explain what was causing the problem?

I've haven't installed 3.9 on my RBD servers yet.  Will setting crush 
tunables back to default or legacy cause me similar problems in the future?

Thank you again Sage!

-Gaylord

On 07/22/2013 02:27 PM, Sage Weil wr:
On Mon, 22 Jul 2013, Gaylord Holder wrote:

I have a 12 OSD/3 host set up, and have be stuck with a bunch of stuck pages.

I've verified the OSDs are all up and in.  The crushmap looks fine.
I've tried restarting all the daemons.

root@never:/var/lib/ceph/mon# ceph status
    health HEALTH_WARN 139 pgs degraded; 461 pgs stuck unclean; recovery
216/6213 degraded (3.477%)
    monmap e4: 2 mons at {a=192.168.225.9:6789/0,b=192.168.225.10:6789/0},
election epoch 14, quorum 0,1 a,b

Add another monitor; right now if 1 fails the cluster is unavailable.

    osdmap e238: 12 osds: 12 up, 12 in
     pgmap v7396: 2528 pgs: 2067 active+clean, 322 active+remapped, 139
active+degraded; 8218 MB data, 103 GB used, 22241 GB / 22345 GB avail;
216/6213 degraded (3.477%)
    mdsmap e1: 0/0/1 up

My guess crush tunables.  Try

  ceph osd crush tunables optimal

unless you are using a pre-3.8(ish) kernel or other very old (pre-bobtail)
clients.

sage

I have one non-default pool with 3x replication.  Fewer than half of the pg
have expanded to 3x (278/400 pgs still have acting 2x sets).

Where can I go look for the trouble?

Thank you for any light someone can shed on this.

Cheers,
-Gaylord
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com