On 30/08/13 11:33 AM, "Wido den Hollander" <wido@xxxxxxxx> wrote: >On 08/30/2013 08:19 PM, Geraint Jones wrote: >> Hi Guys >> >> We are using Ceph in production backing an LXC cluster. The setup is : 2 >> x Servers, 24 x 3TB Disks each in groups of 3 as RAID0. SSD for >> journals. Bonded 1gbit ethernet (2gbit total). >> > >I think you sized your machines too big. I'd say go for 6 machines with >8 disks each without RAID-0. Let Ceph do it's job and avoid RAID. Typical traffic is fine - its just been an issue tonight :) > >Such big sized machines only work in a very large cluster. > >> Overnight we have had a disk failure, this in itself is not a biggie >> but due to the number of VM's we have spawning/shutting down we are >> seeing serious problems. >> > >Are you using CephFS? I think so with LXC? No we are mounting RBD images and using an overlayfs on them. > >> As I understand it ceph will do on demand recovery when a request is >> made for a degraded object ? Is is possible to make this recovery >> traffic go via a different network ? I was contemplating adding a 10gbe >> crossover between the servers to ensure this copy can happen super fast. >> > >Yes, you can use "cluster_network" to direct OSD traffic over different >network interfaces. Perfect, so now to buy some NIC's :) > >Wido > >> If anyone has any suggestions on how to avoid this horrible I/O >> performance hit during recovery, let me know. >> >> Thanks >> >> Geraint >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >-- >Wido den Hollander >42on B.V. > >Phone: +31 (0)20 700 9902 >Skype: contact42on >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com