Re: Not recovering completely on OSD failure

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 8 Nov 2013 09:47:30 -0800

This is probably a result of some difficulties that CRUSH has when using pool sizes equal to the total number of buckets it can choose from. We made some changes to the algorithm earlier this year to deal with it, but if using a kernel client you need a very new one to be compatible so we haven't enabled them by default yet -- see the documentation on "crush tunables".
-Greg

On Friday, November 8, 2013, Niklas Goerke  wrote:
Hi guys

This is probably a configuration error, but I just can't find it.

The following reproduceable happens on my cluster [1].

15:52:15 On Host1 one disk is being removed on the RAID Controller (to ceph it looks as if the disk died)

15:52:52 OSD Reported missing (osd.47)

15:52:53 osdmap eXXX: 60 osds: 59 up, 60 in; 1,781% degraded, 436 PGs stuck unclean, 436 PGs degraded; not recovering yet

15:57:54 osdmap eXXX: 60 osds: 59 up, 59 in; start recovering

15:58:00 2,502% degraded

15:58:01 3,413% degraded; recovering at about 1GB/s --> recovering speed decreasing to about 40MB/s

17:02:10 10 PGs active+remapped, 218 PGs active+degraded, 0.898% degraded, stopped recovering

18:12 Still not recovering

few days later: OSD removed [2], now recovering completely

I would like my cluster to recover completely without me interfering. Can anyone give an educated guess what went wrong here? I can't find the reason why the cluster would just stop recovering.

Thank you for any hints!

Niklas

[1] 4 OSD Hosts with 15 disks each. On each of the 60 identical disks there is one OSD. I have one large pool with 6000 PGs and a replica size of 4, and 3 (default) pools with 64 PGs each

[2] http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com