> Op 27 maart 2017 om 13:22 schreef Christian Balzer <chibi@xxxxxxx>: > > > > Hello, > > On Mon, 27 Mar 2017 12:27:40 +0200 Mattia Belluco wrote: > > > Hello all, > > we are currently in the process of buying new hardware to expand an > > existing Ceph cluster that already has 1200 osds. > > That's quite sizable, is the expansion driven by the need for more space > (big data?) or to increase IOPS (or both)? > > > We are currently using 24 * 4 TB SAS drives per osd with an SSD journal > > shared among 4 osds. For the upcoming expansion we were thinking of > > switching to either 6 or 8 TB hard drives (9 or 12 per host) in order to > > drive down space and cost requirements. > > > > Has anyone any experience in mid-sized/large-sized deployment using such > > hard drives? Our main concern is the rebalance time but we might be > > overlooking some other aspects. > > > > If you researched the ML archives, you should already know to stay well > away from SMR HDDs. > Amen! Just don't. Stay away from SMR with Ceph. > Both HGST and Seagate have large Enterprise HDDs that have > journals/caches (MediaCache in HGST speak IIRC) that drastically improve > write IOPS compared to plain HDDs. > Even with SSD journals you will want to consider those, as these new HDDs > will see at least twice the action than your current ones. > I also have good experiences with bcache on NVM-E device in Ceph clusters. A single Intel P3600/P3700 which is the caching device for bcache. > Rebalance time is a concern of course, especially if your cluster like > most HDD based ones has these things throttled down to not impede actual > client I/O. > > To get a rough idea, take a look at: > https://www.memset.com/tools/raid-calculator/ > > For Ceph with replication 3 and the typical PG distribution, assume 100 > disks and the RAID6 with hotspares numbers are relevant. > For rebuild speed, consult your experience, you must have had a few > failures. ^o^ > > For example with a recovery speed of 100MB/s, a 1TB disk (used data with > Ceph actually) looks decent at 1:16000 DLO/y. > At 5TB though it enters scary land > Yes, those recoveries will take a long time. Let's say your 6TB drive is filled for 80% you need to rebalance 4.8TB 4.8TB / 100MB/sec = 13 hours rebuild time 13 hours is a long time. And you will probably not have 100MB/sec sustained, I think that 50MB/sec is much more realistic. That means that a single disk failure will take >24 hours to recover from a rebuild. I don't like very big disks that much. Not in RAID, not in Ceph. Wido > Christian > > > We currently use the cluster as storage for openstack services: Glance, > > Cinder and VMs' ephemeral disks. > > > > Thanks in advance for any advice. > > > > Mattia > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com