Stuck pages and other bad things

Kyle Hutson <kylehutson@xxxxxxx> · Mon, 29 Apr 2013 09:27:00 -0500

I'm presuming this is the correct list (rather than the -devel list) please correct me if I'm wrong there.
I setup ceph (0.56.4) a few months ago with two disk servers and one dedicated monitor. The disk servers also have monitors, so there are a total of 3 monitors for the cluster. Each of the disk servers has 8 OSDs.

I didn't actually get a 'ceph osd tree' output from that, but cutting-and-pasting relevant parts from what I have now, it probably looked like this:

# id	weight	type name	up/down	reweight

-1	16	root default
-3	16		rack unknownrack
-2	0			host leviathan
100	1				osd.100	up	1
101	1				osd.101	up	1

102	1				osd.102	up	1
103	1				osd.103	up	1
104	1				osd.104	up	1

105	1				osd.105	up	1
106	1				osd.106	up	1
107	1				osd.107	up	1

-4	8			host minotaur
200	1				osd.200	up	1
201	1				osd.201	up	1

202	1				osd.202	up	1
203	1				osd.203	up	1
204	1				osd.204	up	1

205	1				osd.205	up	1
206	1				osd.206	up	1
207	1				osd.207	up	1

A couple of weeks ago, for valid reasons that aren't relevant here, we decided to repurpose one of the disk servers (leviathan) and replace the ceph fileserver with some other hardware. I created a new server (aergia). That changed the 'ceph osd tree' to this:

# id	weight	type name	up/down	reweight
-1	16	root default
-3	16		rack unknownrack
-2	0			host leviathan
100	1				osd.100	up	1
101	1				osd.101	up	1

102	1				osd.102	up	1
103	1				osd.103	up	1
104	1				osd.104	up	1

105	1				osd.105	up	1
106	1				osd.106	up	1
107	1				osd.107	up	1

-4	8			host minotaur
200	1				osd.200	up	1
201	1				osd.201	up	1

202	1				osd.202	up	1
203	1				osd.203	up	1
204	1				osd.204	up	1

205	1				osd.205	up	1
206	1				osd.206	up	1
207	1				osd.207	up	1

0	1			osd.0	up	1
1	1			osd.1	up	1
2	1			osd.2	up	1
3	1			osd.3	up	1
4	1			osd.4	up	1
5	1			osd.5	up	1
6	1			osd.6	up	1
7	1			osd.7	up	1

Everything was looking happy, so I began removing the OSDs on leviathan. That's when the problems stared. 'ceph health detail' shows that there are several pages that either existed only on that disk server, e.g.
pg 0.312 is stuck unclean since forever, current state stale+active+degraded+remapped, last acting [103]
or pages that were only replicated back onto the same host, e.g.
pg 0.2f4 is stuck unclean since forever, current state stale+active+remapped, last acting [106,101]

I brought leviathan back up, and I *think* everything is at least responding now. But 'ceph health' still shows
HEALTH_WARN 302 pgs degraded; 810 pgs stale; 810 pgs stuck stale; 3562 pgs stuck unclean; recovery 44951/2289634 degraded (1.963%)

...and it's been stuck there for a long time.

So my question is, how do I force data off the to-be-decommissioned server safely and get back to "HEALTH_OK"?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com