RE: PGs Degraded

Mark Nigh <mnigh@xxxxxxxxxxxxxxx> · Fri, 8 Apr 2011 14:41:06 -0500

Here is the output from ceph osd dump -o -

epoch 19
fsid c2ae7ab2-d1b2-a467-be6e-f9a0031840f5
created 2011-04-04 13:27:06.857950
modifed 2011-04-08 14:11:05.899596
flags

pg_pool 0 'data' pg_pool(rep pg_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0)
pg_pool 1 'metadata' pg_pool(rep pg_size 2 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0)
pg_pool 2 'casdata' pg_pool(rep pg_size 2 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0)
pg_pool 3 'rbd' pg_pool(rep pg_size 2 crush_ruleset 3 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner 0)

max_osd 6
osd0 up   in  weight 1 up_from 14 up_thru 0 down_at 13 last_clean_interval 10-12 10.6.1.92:6800/17641 10.6.1.92:6801/17641 10.6.1.92:6802/17641
osd1 up   in  weight 1 up_from 4 up_thru 6 down_at 0 last_clean_interval 0-0 10.6.1.93:6800/31106 10.6.1.93:6801/31106 10.6.1.93:6802/31106
osd2 up   in  weight 1 up_from 17 up_thru 0 down_at 16 last_clean_interval 15-16 10.6.1.94:6800/2740 10.6.1.94:6803/2740 10.6.1.94:6804/2740
osd3 up   in  weight 1 up_from 18 up_thru 0 down_at 0 last_clean_interval 0-0 10.6.1.95:6800/32038 10.6.1.95:6801/32038 10.6.1.95:6802/32038

I have not removed any OSDs from the cluster. I created the cluster with a single mds/mon and they have been adding osd slowly.

Mark Nigh
Systems Architect
mnigh@xxxxxxxxxxxxxxx
 (p) 314.392.6926

-----Original Message-----
From: Wido den Hollander [mailto:wido@xxxxxxxxx]
Sent: Friday, April 08, 2011 1:38 PM
To: Mark Nigh
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: PGs Degraded

Hi Mark,

On Fri, 2011-04-08 at 12:09 -0500, Mark Nigh wrote:
> I have recently built a ceph cluster with the following nodes:
>
> 2011-04-08 11:54:08.038841    pg v3661: 264 pgs: 264 active+clean+degraded; 9079 MB data, 9234 MB used, 811 GB / 820 GB avail; 2319/4638 degraded (50.000%)
> 2011-04-08 11:54:08.039492   mds e17: 2/2/2 up {0=up:active,1=up:active}
> 2011-04-08 11:54:08.039529   osd e18: 4 osds: 4 up, 4 in
> 2011-04-08 11:54:08.039592   log 2011-04-08 10:08:09.135994 mds0 10.6.1.90:6800/16761 4 : [INF] closing stale session client4142 10.6.1.62:0/667143763 after 304.524869
> 2011-04-08 11:54:08.039673   mon e1: 1 mons at {0=10.6.1.90:6789/0}
>

That seems odd, your  "data" is only 200MB less then "used". What is the
replication size for the "data" and "metadata" pool?

$ ceph osd dump -o - (rep pg_size)

> I have a few files in the cluster (not much data) but have noticed from the beginning of the build (after the 2 osd) that some of my PGs are degraded.
>
> How do I fix this and is there a tool/command to assist in determining what PGs are degraded?

You can view your degraded pg's with:

$ ceph pg dump -o -

This will tell you the degraded PG's and on which OSD's they are.

Did you remove any OSD's from this cluster? It seems very odd that the
cluster is in a 50% degraded state.

Wido

>
> Ceph -v is as follows:
>
> ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
>
> I appreciate the help.
>
> Mark Nigh
>
> This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
ÿô.nÇ‰·Ÿ®‰†+%ŠË±é¥Šwÿº{.nÇ‰·œz˜ÿuëø¡Ü}©ž²ÆzÚj:+v‰¨þø®w¥þŠàÞ¨è&¢)ß«a¶Úÿûz¹ÞúŽŠÝjÿŠwèf