On 08/30/2013 08:19 PM, Geraint Jones wrote:
Hi Guys We are using Ceph in production backing an LXC cluster. The setup is : 2 x Servers, 24 x 3TB Disks each in groups of 3 as RAID0. SSD for journals. Bonded 1gbit ethernet (2gbit total).
I think you sized your machines too big. I'd say go for 6 machines with 8 disks each without RAID-0. Let Ceph do it's job and avoid RAID.
Such big sized machines only work in a very large cluster.
Overnight we have had a disk failure, this in itself is not a biggie – but due to the number of VM's we have spawning/shutting down we are seeing serious problems.
Are you using CephFS? I think so with LXC?
As I understand it ceph will do on demand recovery when a request is made for a degraded object ? Is is possible to make this recovery traffic go via a different network ? I was contemplating adding a 10gbe crossover between the servers to ensure this copy can happen super fast.
Yes, you can use "cluster_network" to direct OSD traffic over different network interfaces.
Wido
If anyone has any suggestions on how to avoid this horrible I/O performance hit during recovery, let me know. Thanks Geraint _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com