Re: Recomendations for building 1PB RadosGW with Erasure Code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Christian Balzer
> Sent: 17 February 2016 02:41
> To: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  Recomendations for building 1PB RadosGW with
> Erasure Code
> 
> 
> Hello,
> 
> On Tue, 16 Feb 2016 16:39:06 +0800 Василий Ангапов wrote:
> 
> > Nick, Tyler, many thanks for very helpful feedback!
> > I spent many hours meditating on the following two links:
> > http://www.supermicro.com/solutions/storage_ceph.cfm
> > http://s3s.eu/cephshop
> >
> > 60- or even 72-disk nodes are very capacity-efficient, but will the 2
> > CPUs (even the fastest ones) be enough to handle Erasure Coding?
> >
> Depends.
> Since you're doing sequential writes (and reads I assume as you're dealing
> with videos), CPU usage is going to be a lot lower than with random, small
> 4KB block I/Os.
> So most likely, yes.

That was my initial thought, but reading that paper I linked, the 4MB tests are the ones that bring the CPU's to their knees. I think the erasure calculation is a large part of the overall CPU usage and more data with the larger IO's causes a significant increase in CPU requirements.

Correct me if I'm wrong, but I recall Christian, that your cluster is a full SSD cluster? I think we touched on this before, that the ghz per OSD is probably more like 100mhz per IOP. In a spinning disk cluster, you effectively have a cap on the number of IOs you can serve before the disks max out. So the difference between large and small IO's is not that great. But on a SSD cluster there is no cap and so you just end up with more IO's, hence the higher CPU.

> 
> > Also as Nick stated with 4-5 nodes I cannot use high-M "K+M"
> > combinations. I've did some calculations and found that the most
> > efficient and safe configuration is to use 10 nodes with 29*6TB SATA
> > and 7*200GB S3700 for journals. Assuming 6+3 EC profile that will give
> > me
> > 1.16 PB of effective space. Also I prefer not to use precious NVMe
> > drives. Don't see any reason to use them.
> >
> This is probably your best way forward, dense is nice and cost saving, but
> comes with a lot of potential gotchas.
> Dense and large clusters can work, dense and small not so much.
> 
> > But what about RAM? Can I go with 64GB per node with above config?
> > I've seen OSDs are consuming not more than 1GB RAM for replicated
> > pools (even 6TB ones). But what is the typical memory usage of EC
> > pools? Does anybody know that?
> >
> Above config (29 OSDs) that would be just about right.
> I always go with at least 2GB RAM per OSD, since during a full node restart
> and the consecutive peering OSDs will grow large, a LOT larger than their
> usual steady state size.
> RAM isn't that expensive these days and additional RAM comes in very
> handy when used for pagecache and SLAB (dentry) stuff.
> 
> Something else to think about in your specific use case is to have RAID'ed
> OSDs.
> It's a bit of zero sum game probably, but compare the above config with this.
> 11 nodes, each with:
> 34 6TB SATAs (2x 17HDDs RAID6)
> 2 200GB S3700 SSDs (journal/OS)
> Just 2 OSDs per node.
> Ceph with replication of 2.
> Just shy of 1PB of effective space.
> 
> Minus: More physical space, less efficient HDD usage (replication vs. EC).
> 
> Plus: A lot less expensive SSDs, less CPU and RAM requirements, smaller
> impact in case of node failure/maintenance.
> 
> No ideas about the stuff below.
> 
> Christian
> > Also, am I right that for 6+3 EC profile i need at least 10 nodes to
> > feel comfortable (one extra node for redundancy)?
> >
> > And finally can someone recommend what EC plugin to use in my case? I
> > know it's a difficult question but anyway?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2016-02-16 16:12 GMT+08:00 Nick Fisk <nick@xxxxxxxxxx>:
> > >
> > >
> > >> -----Original Message-----
> > >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> > >> Behalf Of Tyler Bishop
> > >> Sent: 16 February 2016 04:20
> > >> To: Василий Ангапов <angapov@xxxxxxxxx>
> > >> Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> > >> Subject: Re:  Recomendations for building 1PB RadosGW
> > >> with Erasure Code
> > >>
> > >> You should look at a 60 bay 4U chassis like a Cisco UCS C3260.
> > >>
> > >> We run 4 systems at 56x6tB with dual E5-2660 v2 and 256gb ram.
> > >> Performance is excellent.
> > >
> > > Only thing I will say to the OP, is that if you only need 1PB, then
> > > likely 4-5 of these will give you enough capacity. Personally I
> > > would prefer to spread the capacity around more nodes. If you are
> > > doing anything serious with Ceph its normally a good idea to try and
> > > make each node no more than 10% of total capacity. Also with Ec
> > > pools you will be limited to the K+M combo's you can achieve with
> > > smaller number of nodes.
> > >
> > >>
> > >> I would recommend a cache tier for sure if your data is busy for
> > >> reads.
> > >>
> > >> Tyler Bishop
> > >> Chief Technical Officer
> > >> 513-299-7108 x10
> > >>
> > >>
> > >>
> > >> Tyler.Bishop@xxxxxxxxxxxxxxxxx
> > >>
> > >>
> > >> If you are not the intended recipient of this transmission you are
> > >> notified that disclosing, copying, distributing or taking any
> > >> action in reliance on the contents of this information is strictly
> > >> prohibited.
> > >>
> > >> ----- Original Message -----
> > >> From: "Василий Ангапов" <angapov@xxxxxxxxx>
> > >> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> > >> Sent: Friday, February 12, 2016 7:44:07 AM
> > >> Subject:  Recomendations for building 1PB RadosGW with
> > >> Erasure       Code
> > >>
> > >> Hello,
> > >>
> > >> We are planning to build 1PB Ceph cluster for RadosGW with Erasure
> > >> Code. It will be used for storing online videos.
> > >> We do not expect outstanding write performace, something like 200-
> > >> 300MB/s of sequental write will be quite enough, but data safety is
> > >> very important.
> > >> What are the most popular hardware and software recomendations?
> > >> 1) What EC profile is best to use? What values of K/M do you
> > >> recommend?
> > >
> > > The higher total k+m you go, you will require more CPU and
> > > sequential performance will degrade slightly as the IO's are smaller
> > > going to the disks. However larger numbers allow you to be more
> > > creative with failure scenarios and "replication" efficiency.
> > >
> > >> 2) Do I need to use Cache Tier for RadosGW or it is only needed for
> > >> RBD? Is it
> > >
> > > Only needed for RBD, but depending on workload it may still benefit.
> > > If you are mostly doing large IO's, the gains will be a lot smaller.
> > >
> > >> still an overall good practice to use Cache Tier for RadosGW?
> > >> 3) What hardware is recommended for EC? I assume higher-clocked
> > >> CPUs are needed? What about RAM?
> > >
> > > Total Ghz is more important (ie ghzxcores) Go with the
> > > cheapest/power efficient you can get. Aim for somewhere around 1Ghz
> per disk.
> > >
> > >> 4) What SSDs for Ceph journals are the best?
> > >
> > > Intel S3700 or P3700 (if you can stretch)
> > >
> > > By all means explore other options, but you can't go wrong by buying
> > > these. Think "You can't get fired for buying Cisco" quote!!!
> > >
> > >>
> > >> Thanks a lot!
> > >>
> > >> Regards, Vasily.
> > >> _______________________________________________
> > >> ceph-users mailing list
> > >> ceph-users@xxxxxxxxxxxxxx
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >> _______________________________________________
> > >> ceph-users mailing list
> > >> ceph-users@xxxxxxxxxxxxxx
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux