> -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Christian Balzer > Sent: 17 February 2016 02:41 > To: ceph-users <ceph-users@xxxxxxxxxxxxxx> > Subject: Re: Recomendations for building 1PB RadosGW with > Erasure Code > > > Hello, > > On Tue, 16 Feb 2016 16:39:06 +0800 Василий Ангапов wrote: > > > Nick, Tyler, many thanks for very helpful feedback! > > I spent many hours meditating on the following two links: > > http://www.supermicro.com/solutions/storage_ceph.cfm > > http://s3s.eu/cephshop > > > > 60- or even 72-disk nodes are very capacity-efficient, but will the 2 > > CPUs (even the fastest ones) be enough to handle Erasure Coding? > > > Depends. > Since you're doing sequential writes (and reads I assume as you're dealing > with videos), CPU usage is going to be a lot lower than with random, small > 4KB block I/Os. > So most likely, yes. That was my initial thought, but reading that paper I linked, the 4MB tests are the ones that bring the CPU's to their knees. I think the erasure calculation is a large part of the overall CPU usage and more data with the larger IO's causes a significant increase in CPU requirements. Correct me if I'm wrong, but I recall Christian, that your cluster is a full SSD cluster? I think we touched on this before, that the ghz per OSD is probably more like 100mhz per IOP. In a spinning disk cluster, you effectively have a cap on the number of IOs you can serve before the disks max out. So the difference between large and small IO's is not that great. But on a SSD cluster there is no cap and so you just end up with more IO's, hence the higher CPU. > > > Also as Nick stated with 4-5 nodes I cannot use high-M "K+M" > > combinations. I've did some calculations and found that the most > > efficient and safe configuration is to use 10 nodes with 29*6TB SATA > > and 7*200GB S3700 for journals. Assuming 6+3 EC profile that will give > > me > > 1.16 PB of effective space. Also I prefer not to use precious NVMe > > drives. Don't see any reason to use them. > > > This is probably your best way forward, dense is nice and cost saving, but > comes with a lot of potential gotchas. > Dense and large clusters can work, dense and small not so much. > > > But what about RAM? Can I go with 64GB per node with above config? > > I've seen OSDs are consuming not more than 1GB RAM for replicated > > pools (even 6TB ones). But what is the typical memory usage of EC > > pools? Does anybody know that? > > > Above config (29 OSDs) that would be just about right. > I always go with at least 2GB RAM per OSD, since during a full node restart > and the consecutive peering OSDs will grow large, a LOT larger than their > usual steady state size. > RAM isn't that expensive these days and additional RAM comes in very > handy when used for pagecache and SLAB (dentry) stuff. > > Something else to think about in your specific use case is to have RAID'ed > OSDs. > It's a bit of zero sum game probably, but compare the above config with this. > 11 nodes, each with: > 34 6TB SATAs (2x 17HDDs RAID6) > 2 200GB S3700 SSDs (journal/OS) > Just 2 OSDs per node. > Ceph with replication of 2. > Just shy of 1PB of effective space. > > Minus: More physical space, less efficient HDD usage (replication vs. EC). > > Plus: A lot less expensive SSDs, less CPU and RAM requirements, smaller > impact in case of node failure/maintenance. > > No ideas about the stuff below. > > Christian > > Also, am I right that for 6+3 EC profile i need at least 10 nodes to > > feel comfortable (one extra node for redundancy)? > > > > And finally can someone recommend what EC plugin to use in my case? I > > know it's a difficult question but anyway? > > > > > > > > > > > > > > > > > > > > 2016-02-16 16:12 GMT+08:00 Nick Fisk <nick@xxxxxxxxxx>: > > > > > > > > >> -----Original Message----- > > >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On > > >> Behalf Of Tyler Bishop > > >> Sent: 16 February 2016 04:20 > > >> To: Василий Ангапов <angapov@xxxxxxxxx> > > >> Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx> > > >> Subject: Re: Recomendations for building 1PB RadosGW > > >> with Erasure Code > > >> > > >> You should look at a 60 bay 4U chassis like a Cisco UCS C3260. > > >> > > >> We run 4 systems at 56x6tB with dual E5-2660 v2 and 256gb ram. > > >> Performance is excellent. > > > > > > Only thing I will say to the OP, is that if you only need 1PB, then > > > likely 4-5 of these will give you enough capacity. Personally I > > > would prefer to spread the capacity around more nodes. If you are > > > doing anything serious with Ceph its normally a good idea to try and > > > make each node no more than 10% of total capacity. Also with Ec > > > pools you will be limited to the K+M combo's you can achieve with > > > smaller number of nodes. > > > > > >> > > >> I would recommend a cache tier for sure if your data is busy for > > >> reads. > > >> > > >> Tyler Bishop > > >> Chief Technical Officer > > >> 513-299-7108 x10 > > >> > > >> > > >> > > >> Tyler.Bishop@xxxxxxxxxxxxxxxxx > > >> > > >> > > >> If you are not the intended recipient of this transmission you are > > >> notified that disclosing, copying, distributing or taking any > > >> action in reliance on the contents of this information is strictly > > >> prohibited. > > >> > > >> ----- Original Message ----- > > >> From: "Василий Ангапов" <angapov@xxxxxxxxx> > > >> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > > >> Sent: Friday, February 12, 2016 7:44:07 AM > > >> Subject: Recomendations for building 1PB RadosGW with > > >> Erasure Code > > >> > > >> Hello, > > >> > > >> We are planning to build 1PB Ceph cluster for RadosGW with Erasure > > >> Code. It will be used for storing online videos. > > >> We do not expect outstanding write performace, something like 200- > > >> 300MB/s of sequental write will be quite enough, but data safety is > > >> very important. > > >> What are the most popular hardware and software recomendations? > > >> 1) What EC profile is best to use? What values of K/M do you > > >> recommend? > > > > > > The higher total k+m you go, you will require more CPU and > > > sequential performance will degrade slightly as the IO's are smaller > > > going to the disks. However larger numbers allow you to be more > > > creative with failure scenarios and "replication" efficiency. > > > > > >> 2) Do I need to use Cache Tier for RadosGW or it is only needed for > > >> RBD? Is it > > > > > > Only needed for RBD, but depending on workload it may still benefit. > > > If you are mostly doing large IO's, the gains will be a lot smaller. > > > > > >> still an overall good practice to use Cache Tier for RadosGW? > > >> 3) What hardware is recommended for EC? I assume higher-clocked > > >> CPUs are needed? What about RAM? > > > > > > Total Ghz is more important (ie ghzxcores) Go with the > > > cheapest/power efficient you can get. Aim for somewhere around 1Ghz > per disk. > > > > > >> 4) What SSDs for Ceph journals are the best? > > > > > > Intel S3700 or P3700 (if you can stretch) > > > > > > By all means explore other options, but you can't go wrong by buying > > > these. Think "You can't get fired for buying Cisco" quote!!! > > > > > >> > > >> Thanks a lot! > > >> > > >> Regards, Vasily. > > >> _______________________________________________ > > >> ceph-users mailing list > > >> ceph-users@xxxxxxxxxxxxxx > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> _______________________________________________ > > >> ceph-users mailing list > > >> ceph-users@xxxxxxxxxxxxxx > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com