Building a petabyte cluster from scratch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ceph users !

After years of using Ceph, we plan to build soon a new cluster bigger than what
we've done in the past. As the project is still in reflection, I'd like to
have your thoughts on our planned design : any feedback is welcome :)


## Requirements

 * ~1 PB usable space for file storage, extensible in the future
 * The files are mostly "hot" data, no cold storage
 * Purpose : storage for big files being essentially used on windows workstations (10G access)
 * Performance is better :)


## Global design

 * 8+3 Erasure Coded pool
 * ZFS on RBD, exposed via samba shares (cluster with failover)


## Hardware

 * 1 rack (multi-site would be better, of course...)

 * OSD nodes : 14 x supermicro servers
   * 24 usable bays in 2U rackspace
   * 16 x 10 TB nearline SAS HDD (8 bays for future needs)
   * 2 x Xeon Silver 4212 (12C/24T)
   * 128 GB RAM
   * 4 x 40G QSFP+

 * Networking : 2 x Cisco N3K 3132Q or 3164Q
   * 2 x 40G per server for ceph network (LACP/VPC for HA)
   * 2 x 40G per server for public network (LACP/VPC for HA)
   * QSFP+ DAC cables


## Sizing

If we've done the maths well, we expect to have :

 * 2.24 PB of raw storage, extensible to 3.36 PB by adding HDD
 * 1.63 PB expected usable space with 8+3 EC, extensible to 2.44 PB
 * ~1 PB of usable space if we want to keep the OSD use under 66% to allow
   loosing nodes without problem, extensible to 1.6 PB (same condition)


## Reflections

 * We're used to run mons and mgrs daemons on a few of our OSD nodes, without
   any issue so far : is this a bad idea for a big cluster ?

 * We thought using cache tiering on an SSD pool, but a large part of the PB is
   used on a daily basis, so we expect the cache to be not so effective and
   really expensive ?

 * Could a 2x10G network be enough ?

 * ZFS on Ceph ? Any thoughts ?

 * What about CephFS ? We'd like to use RBD diff for backups but it looks
   impossible to use snapshot diff with Cephfs ?


Thanks for reading, and sharing your experiences !

F.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux