RE: PGs Degraded

Mark Nigh <mnigh@xxxxxxxxxxxxxxx> · Fri, 8 Apr 2011 15:20:53 -0500

I found the problem. My crush map had only one (1) device (the first osd) in it. I have added the other three (3) and now I am active+clean

Mark Nigh
Systems Architect
mnigh@xxxxxxxxxxxxxxx
 (p) 314.392.6926

-----Original Message-----
From: Wido den Hollander [mailto:wido@xxxxxxxxx]
Sent: Friday, April 08, 2011 3:14 PM
To: Mark Nigh
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: PGs Degraded

Hi,
----- Original message -----
> Here is the output from ceph osd dump -o -
>
> epoch 19
> fsid c2ae7ab2-d1b2-a467-be6e-f9a0031840f5
> created 2011-04-04 13:27:06.857950
> modifed 2011-04-08 14:11:05.899596
> flags
>
> pg_pool 0 'data' pg_pool(rep pg_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 1 owner
> 0) pg_pool 1 'metadata' pg_pool(rep pg_size 2 crush_ruleset 1
> object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2
> last_change 1 owner 0) pg_pool 2 'casdata' pg_pool(rep pg_size 2
> crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2
> lpgp_num 2 last_change 1 owner 0) pg_pool 3 'rbd' pg_pool(rep pg_size 2
> crush_ruleset 3 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2
> lpgp_num 2 last_change 1 owner 0)
>
> max_osd 6

Did you change the max_osd yourself? The replication size seem correct.

> I have not removed any OSDs from the cluster. I created the cluster with
> a single mds/mon and they have been adding osd slowly.

Did you start with one osd initially? Did you add the new OSD's to the crushmap after adding them to the cluster.

Wido

>
> Mark Nigh
> Systems Architect
> mnigh@xxxxxxxxxxxxxxx
>   (p) 314.392.6926
>
>
>
> -----Original Message-----
> From: Wido den Hollander [mailto:wido@xxxxxxxxx]
> Sent: Friday, April 08, 2011 1:38 PM
> To: Mark Nigh
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: PGs Degraded
>
> Hi Mark,
>
> On Fri, 2011-04-08 at 12:09 -0500, Mark Nigh wrote:
> > I have recently built a ceph cluster with the following nodes:
> >
> > 2011-04-08 11:54:08.038841       pg v3661: 264 pgs: 264
> > active+clean+degraded; 9079 MB data, 9234 MB used, 811 GB / 820 GB
> > avail; 2319/4638 degraded (50.000%) 2011-04-08 11:54:08.039492     mds
> > e17: 2/2/2 up {0=up:active,1=up:active} 2011-04-08 11:54:08.039529
> > osd e18: 4 osds: 4 up, 4 in 2011-04-08 11:54:08.039592     log
> > 2011-04-08 10:08:09.135994 mds0 10.6.1.90:6800/16761 4 : [INF] closing
> > stale session client4142 10.6.1.62:0/667143763 after 304.524869
> > 2011-04-08 11:54:08.039673     mon e1: 1 mons at {0=10.6.1.90:6789/0}
> >
>
> That seems odd, your   "data" is only 200MB less then "used". What is the
> replication size for the "data" and "metadata" pool?
>
> $ ceph osd dump -o - (rep pg_size)
>
> > I have a few files in the cluster (not much data) but have noticed
> > from the beginning of the build (after the 2 osd) that some of my PGs
> > are degraded.
> >
> > How do I fix this and is there a tool/command to assist in determining
> > what PGs are degraded?
>
> You can view your degraded pg's with:
>
> $ ceph pg dump -o -
>
> This will tell you the degraded PG's and on which OSD's they are.
>
> Did you remove any OSD's from this cluster? It seems very odd that the
> cluster is in a 50% degraded state.
>
> Wido
>
> >
> > Ceph -v is as follows:
> >
> > ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
> >
> > I appreciate the help.
> >
> > Mark Nigh
> >
> > This transmission and any attached files are privileged, confidential
> > or otherwise the exclusive property of the intended recipient or
> > Netelligent Corporation. If you are not the intended recipient, any
> > disclosure, copying, distribution or use of any of the information
> > contained in or attached to this transmission is strictly prohibited.
> > If you have received this transmission in error, please contact us
> > immediately by responding to this message or by telephone
> > (314-392-6900) and promptly destroy the original transmission and its
> > attachments. -- To unsubscribe from this list: send the line
> > "unsubscribe ceph-devel" in the body of a message to
> > majordomo@xxxxxxxxxxxxxxx More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
>
>
>
> This transmission and any attached files are privileged, confidential or
> otherwise the exclusive property of the intended recipient or
> Netelligent Corporation. If you are not the intended recipient, any
> disclosure, copying, distribution or use of any of the information
> contained in or attached to this transmission is strictly prohibited. If
> you have received this transmission in error, please contact us
> immediately by responding to this message or by telephone (314-392-6900)
> and promptly destroy the original transmission and its attachments.
> NrybXÇv^)Þ{.n+z]z{ayÊÚ,jfhzw j:+vwjmzZ+Ýj"!

This transmission and any attached files are privileged, confidential or otherwise the exclusive property of the intended recipient or Netelligent Corporation. If you are not the intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please contact us immediately by responding to this message or by telephone (314-392-6900) and promptly destroy the original transmission and its attachments.
ÿô.nÇ‰·Ÿ®‰†+%ŠË±é¥Šwÿº{.nÇ‰·œz˜ÿuëø¡Ü}©ž²ÆzÚj:+v‰¨þø®w¥þŠàÞ¨è&¢)ß«a¶Úÿûz¹ÞúŽŠÝjÿŠwèf