Hi Christian, thanks for your response. >>We are using 0.45 in production. Recent ceph versions are quite stable >>(although we hat some troubles with excessive logging and a full log >>partition lately which caused our cluster to halt). excessive logging because of a configuration error ? >>For the moment I would definitely recommend using XFS as the >>underlying filesystem. At least until there is a fix for the >>orphan_commit_root problem. XFS comes with a slight performance >>impact, but it seems to be the only filesystem that is able to handle >>heavy ceph workload for the moment. What's the benefit of using btrfs ? snapshots ? (I would like to be able to do snapshots, maybe clones) >>We are running a small ceph cluster (4 Servers with 4 OSDs each) on a >>10GE network. Servers are spread across two datacenters with a 5km (3 >>mile) long 10GE fibre-link for data replication. Our servers are >>equipped with 80GB Fusion-IO drives (for the journal) and traditional >>3,5'' SAS drives in a RAID5 configuration (but I would not reccommend >>this setup). >>From a guest we can get a throughput ~ 500MB/s. Great ! (And from multiple guests ? do you have more throughput ?) Also about latencies, do you have good latencies with you fusion-io journal? I currently use zfs storage, and writes are going to fast journal nvram then flushed to disk 15K. It is the same behaviour with ceph ? >>This is probably the best hardware for a ceph cluster money can buy. >>Are you planning a single SAS drive per OSD? Yes, on osd by drive. So if something goes wrong with brtfs or xfs, I'll have only 1 failed disk and not the whole raid. Is it the right way for osd ? >>I still don't know the cause exactly, but we are not able to saturate >>10GE (maybe it's the latency on the WAN link or some network >>configuration problem). yes maybe. (I would like to have money for this kind of setup ;) >>I did some artificial tests with btrfs with large metadata enabled >>(e.g. mkfs.btrfs -l 64k -n 64k) the performance degradation seems to >>be gone. Great! (I'm very scary about this kind of bugs) >>We are using bonding. The rados-client is doing a failover to another >>osd node after a few seconds, when there is no response from the OSD. >>(You should read about CRUSH in the ceph docs). Thanks again for all your reponse. (Ceph community seem to be great :) Regards, Alexandre ----- Mail original ----- De: "Christian Brunner" <christian@xxxxxxxxxxxxxx> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> Cc: ceph-devel@xxxxxxxxxxxxxxx Envoyé: Vendredi 18 Mai 2012 10:45:48 Objet: Re: is rados block cluster production ready ? 2012/5/18 Alexandre DERUMIER <aderumier@xxxxxxxxx>: > Hi, > I'm going to build a rados block cluster for my kvm hypervisors. > > Is it already production ready ? (stable,no crash) We are using 0.45 in production. Recent ceph versions are quite stable (although we hat some troubles with excessive logging and a full log partition lately which caused our cluster to halt). > I have read some btrfs bugs on this mailing list, so I'm a bit scary... For the moment I would definitely recommend using XFS as the underlying filesystem. At least until there is a fix for the orphan_commit_root problem. XFS comes with a slight performance impact, but it seems to be the only filesystem that is able to handle heavy ceph workload for the moment. > Also, what performance could I expect ? We are running a small ceph cluster (4 Servers with 4 OSDs each) on a 10GE network. Servers are spread across two datacenters with a 5km (3 mile) long 10GE fibre-link for data replication. Our servers are equipped with 80GB Fusion-IO drives (for the journal) and traditional 3,5'' SAS drives in a RAID5 configuration (but I would not reccommend this setup). >From a guest we can get a throughput ~ 500MB/s. > I try to build a fast cluster, with fast ssd disk. > each node : 8 osds with "ocz talos" sas drive + stec zeusram drive (8GB nvram) for the journal + 10GB ethernet. > Do you think I can saturate the 10GB ? This is probably the best hardware for a ceph cluster money can buy. Are you planning a single SAS drive per OSD? I still don't know the cause exactly, but we are not able to saturate 10GE (maybe it's the latency on the WAN link or some network configuration problem). > I also have some questions about performance in time. > I have had somes problems with my zfs san and zfs fragmentation and metastab problem. > How does btrfs perform in time ? I did some artificial tests with btrfs with large metadata enabled (e.g. mkfs.btrfs -l 64k -n 64k) the performance degradation seems to be gone. > About network, does the rados protocol support some kind of multipathing ? Or does I need to use bonding/lacp ? We are using bonding. The rados-client is doing a failover to another osd node after a few seconds, when there is no response from the OSD. (You should read about CRUSH in the ceph docs). Regards, Christian -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html