> I am looking at evaluating ceph for use with large storage nodes (24-36 SATA > disks per node, 3 or 4TB per disk, HBAs, 10G ethernet). > > What would be the best practice for deploying this? I can see two main > options. > > (1) Run 24-36 osds per node. Configure ceph to replicate data to one or more > other nodes. This means that if a disk fails, there will have to be an > operational process to stop the osd, unmount and replace the disk, mkfs a > new filesystem, mount it, and restart the osd - which could be more > complicated and error-prone than a RAID swap would be. > > (2) Combine the disks using some sort of RAID (or ZFS raidz/raidz2), and run > one osd per node. In this case: > * if I use RAID0 or LVM, then a single disk failure will cause all the data on the > node to be lost and rebuilt > * if I use RAID5/6, then write performance is likely to be poor > * if I use RAID10, then capacity is reduced by half; with ceph replication each > piece of data will be replicated 4 times (twice on one node, twice on the > replica node) > > It seems to me that (1) is what ceph was designed to achieve, maybe with 2 > or 3 replicas. Is this what's recommended? > There is a middle ground to consider - 12-18 OSD's each running on a pair of disks in a RAID1 configuration. This would reduce most disk failures to a simple disk swap (assuming an intelligent hardware RAID controller). Obviously you still have a 50% reduction in disk space, but you have the advantage that your filesystem never sees the bad disk and all the problems that can cause. James _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com