Re: EC on 1.1PB?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We're running 12 OSDs per node, with 32 hyper-threaded CPUs available. We over-provisioned the CPUs because we would like to additionally run jobs from our batch system and isolate them via cgroups (we're a high-throughput computing facility). . With a total of ~13000 pgs across a few pools, I'm seeing about 1GB of resident memory per OSD. As far as EC plugins go, we're using jerasure and haven't experimented with others.

That said, in our use case we're using CephFS, so we're fronting the erasure-coded pool with a cache tier. The cache pool is limited to 5TB, and right now usage is light enough that most operations live in the cache tier and rarely get flushed out to the EC pool. I'm sure as we bring more users onto this, there will be some more tweaking to do.

As far as performance goes, you might want to read Mark Nelson's excellent document about EC performance under Firefly. If you search the list archives, he sent a mail in February titled "Erasure Coding CPU Overhead Data". I can forward you the PDF off-list if you would like.

--Lincoln

On Jun 19, 2015, at 12:42 PM, Sean wrote:

Thanks lincoln! May I ask how many drives you have per storage node and how many threads you have available? IE are you using hyper threading and do you have more than 24 disks per node in your cluster? I noticed with our replicated cluster that disks == more pgs == more cpu/ram and with 24+ disks this ends up causing issues in some cases. So a 3 node cluster with 70 disks each is fine but scaling up to 21 and i see issues. Even with connections, pids, and file descriptors turned up. Are you using just jerasure or have you tried the ISA driver as well?

Sorry for bombarding you with questions I am just curious as to where the 40% performance comes from.

On 06/19/2015 11:05 AM, Lincoln Bryant wrote:
Hi Sean,

We have ~1PB of EC storage using Dell R730xd servers with 6TB OSDs. We've got our erasure coding profile set up to be k=10,m=3 which gives us a very reasonable chunk of the raw storage with nice resiliency.

I found that CPU usage was significantly higher in EC, but not so much as to be problematic. Additionally, EC performance was about 40% of replicated pool performance in our testing. 

With 36-disk servers you'll probably need to make sure you do the usual kernel tweaks like increasing the max number of file descriptors, etc. 

Cheers,
Lincoln

On Jun 19, 2015, at 10:36 AM, Sean wrote:

I am looking to use Ceph using EC on a few leftover storage servers (36 disk supermicro servers with dual xeon sockets and around 256Gb of ram). I did a small test using one node and using the ISA library and noticed that the CPU load was pretty spikey for just normal operation.

Does anyone have any experience running Ceph EC on around 216 to 270 4TB disks? I'm looking  to yield around 680 TB to 1PB if possible. just putting my feelers out there to see if anyone else has had any experience and looking for any guidance.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux