Large storage nodes - best practices

Brian Candler <b.candler@xxxxxxxxx> · Mon, 05 Aug 2013 14:50:36 +0100



    I am looking at evaluating ceph for use with large storage nodes
    (24-36 SATA disks per node, 3 or 4TB per disk, HBAs, 10G ethernet).

    
    What would be the best practice for deploying this? I can see two
    main options.

    
    (1) Run 24-36 osds per node. Configure ceph to replicate data to one
    or more other nodes. This means that if a disk fails, there will
    have to be an operational process to stop the osd, unmount and
    replace the disk, mkfs a new filesystem, mount it, and restart the
    osd - which could be more complicated and error-prone than a RAID
    swap would be.

    
    (2) Combine the disks using some sort of RAID (or ZFS raidz/raidz2),
    and run one osd per node. In this case:

    * if I use RAID0 or LVM, then a single disk failure will cause all
    the data on the node to be lost and rebuilt

    * if I use RAID5/6, then write performance is likely to be poor

    * if I use RAID10, then capacity is reduced by half; with ceph
    replication each piece of data will be replicated 4 times (twice on
    one node, twice on the replica node)

    
    It seems to me that (1) is what ceph was designed to achieve, maybe
    with 2 or 3 replicas. Is this what's recommended?

    
    I have seen some postings which imply one osd per node: e.g.

http://www.sebastien-han.fr/blog/2012/08/17/ceph-storage-node-maintenance/

    shows three nodes each with one OSD - but maybe this was just a
    trivial example for simplicity.

    
    Looking at

    http://ceph.com/docs/next/install/hardware-recommendations/

    it says "
    
    You *may* run multiple OSDs per host" (my emphasis), and goes on to
    caution against having more disk bandwidth than network bandwidth.
    Ah, but at another point it says "
    
    We recommend using a dedicated drive for the operating system and
    software, and one drive for each OSD daemon you run on the host." So
    I guess that's fairly clear.

    
    Anything other options I should be considering?

    
    Regards,

    
    Brian.

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com