Re: Not recovering completely on OSD failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is probably a result of some difficulties that CRUSH has when using pool sizes equal to the total number of buckets it can choose from. We made some changes to the algorithm earlier this year to deal with it, but if using a kernel client you need a very new one to be compatible so we haven't enabled them by default yet -- see the documentation on "crush tunables".
-Greg

On Friday, November 8, 2013, Niklas Goerke wrote:
Hi guys

This is probably a configuration error, but I just can't find it.
The following reproduceable happens on my cluster [1].

15:52:15 On Host1 one disk is being removed on the RAID Controller (to ceph it looks as if the disk died)
15:52:52 OSD Reported missing (osd.47)
15:52:53 osdmap eXXX: 60 osds: 59 up, 60 in; 1,781% degraded, 436 PGs stuck unclean, 436 PGs degraded; not recovering yet
15:57:54 osdmap eXXX: 60 osds: 59 up, 59 in; start recovering
15:58:00 2,502% degraded
15:58:01 3,413% degraded; recovering at about 1GB/s --> recovering speed decreasing to about 40MB/s
17:02:10 10 PGs active+remapped, 218 PGs active+degraded, 0.898% degraded, stopped recovering
18:12 Still not recovering
few days later: OSD removed [2], now recovering completely

I would like my cluster to recover completely without me interfering. Can anyone give an educated guess what went wrong here? I can't find the reason why the cluster would just stop recovering.

Thank you for any hints!
Niklas


[1] 4 OSD Hosts with 15 disks each. On each of the 60 identical disks there is one OSD. I have one large pool with 6000 PGs and a replica size of 4, and 3 (default) pools with 64 PGs each
[2] http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux