Re: pgs stuck unclean -- how to fix? (fwd)

Wido den Hollander <wido@xxxxxxxx> · Fri, 09 Aug 2013 11:08:58 +0200

On 08/09/2013 10:58 AM, Jeff Moskow wrote:
Hi,

	I have a 5 node ceph cluster that is running well (no problems using any of the
rbd images and that's really all we use).

	I have replication set to 3 on all three pools (data, metadata and rbd).

	"ceph -s" reports:
    		health HEALTH_WARN 3 pgs degraded; 114 pgs stuck unclean; recovery 5746/384795 degraded (1.493%)

	I have tried everything I could think of to clear/fix those errors and they persist.

Did you restart the primary OSD for that PGs?

Wido

	Most of them appear to be a problem with not having 3 copies....

0.2a0   0       0       0       0       0       0       0       active+remapped 2013-08-06 05:40:07.874427      0'0     21920'388       [4,7]   [4,7,8] 0'0     2013-08-04 08:59:34.035198      0'0     2013-07-29 01:49:40.018625
4.1d9   260     0       238     0       1021055488      0       0       active+remapped 2013-08-06 05:56:20.447612      21920'12710     21920'53408     [6,13]  [6,13,4]        0'0 2013-08-05 06:59:44.717555      0'0     2013-08-05 06:59:44.717555
1.1dc   0       0       0       0       0       0       0       active+remapped 2013-08-06 05:55:44.687830      0'0     21920'3003      [6,13]  [6,13,4]        0'0     2013-08-04 10:56:51.226012      0'0     2013-07-28 23:47:13.404512
0.1dd   0       0       0       0       0       0       0       active+remapped 2013-08-06 05:55:44.687525      0'0     21920'3003      [6,13]  [6,13,4]        0'0     2013-08-04 10:56:45.258459      0'0     2013-08-01 05:58:17.141625
1.29f   0       0       0       0       0       0       0       active+remapped 2013-08-06 05:40:07.882865      0'0     21920'388       [4,7]   [4,7,8] 0'0     2013-08-04 09:01:40.075441      0'0     2013-07-29 01:53:10.068503
1.118   0       0       0       0       0       0       0       active+remapped 2013-08-06 05:50:34.081067      0'0     21920'208       [8,15]  [8,15,5]        0'0     2034-02-12 23:20:03.933842      0'0     2034-02-12 23:20:03.933842
0.119   0       0       0       0       0       0       0       active+remapped 2013-08-06 05:50:34.095446      0'0     21920'208       [8,15]  [8,15,5]        0'0     2034-02-12 23:18:07.310080      0'0     2034-02-12 23:18:07.310080
4.115   248     0       226     0       987364352       0       0       active+remapped 2013-08-06 05:50:34.112139      21920'6840      21920'42982     [8,15]  [8,15,5]        0'0 2013-08-05 06:59:18.303823      0'0     2013-08-05 06:59:18.303823
4.4a    241     0       286     0       941573120       0       0       active+degraded 2013-08-06 12:00:47.758742      21920'85238     21920'206648    [4,6]   [4,6]   0'0 2013-08-05 06:58:36.681726      0'0     2013-08-05 06:58:36.681726
0.4e    0       0       0       0       0       0       0       active+remapped 2013-08-06 12:00:47.765391      0'0     21920'489       [4,6]   [4,6,1] 0'0     2013-08-04 08:58:12.783265      0'0     2013-07-28 14:21:38.227970

	Can anyone suggest a way to clear this up?

Thanks!
	Jeff

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com