Re: EC on 1.1PB?

Lincoln Bryant <lincolnb@xxxxxxxxxxxx> · Fri, 19 Jun 2015 12:59:21 -0500

We're running 12 OSDs per node, with 32 hyper-threaded CPUs available. We over-provisioned the CPUs because we would like to additionally run jobs from our batch system and isolate them via cgroups (we're a high-throughput computing facility). . With a total of ~13000 pgs across a few pools, I'm seeing about 1GB of resident memory per OSD. As far as EC plugins go, we're using jerasure and haven't experimented with others.

That said, in our use case we're using CephFS, so we're fronting the erasure-coded pool with a cache tier. The cache pool is limited to 5TB, and right now usage is light enough that most operations live in the cache tier and rarely get flushed out to the EC pool. I'm sure as we bring more users onto this, there will be some more tweaking to do.

As far as performance goes, you might want to read Mark Nelson's excellent document about EC performance under Firefly. If you search the list archives, he sent a mail in February titled "Erasure Coding CPU Overhead Data". I can forward you the PDF off-list if you would like.

--Lincoln

On Jun 19, 2015, at 12:42 PM, Sean wrote:

    Thanks lincoln! May I ask how many drives you have per storage node
    and how many threads you have available? IE are you using hyper
    threading and do you have more than 24 disks per node in your
    cluster? I noticed with our replicated cluster that disks == more
    pgs == more cpu/ram and with 24+ disks this ends up causing issues
    in some cases. So a 3 node cluster with 70 disks each is fine but
    scaling up to 21 and i see issues. Even with connections, pids, and
    file descriptors turned up. Are you using just jerasure or have you
    tried the ISA driver as well? 

    Sorry for bombarding you with questions I am just curious as to
    where the 40% performance comes from.

    On 06/19/2015 11:05 AM, Lincoln Bryant
      wrote:

      Hi Sean,

      We have ~1PB of EC storage using Dell R730xd servers with 6TB
        OSDs. We've got our erasure coding profile set up to be k=10,m=3
        which gives us a very reasonable chunk of the raw storage with
        nice resiliency.

      I found that CPU usage was significantly higher in EC, but
        not so much as to be problematic. Additionally, EC performance
        was about 40% of replicated pool performance in our testing. 

      With 36-disk servers you'll probably need to make sure you do
        the usual kernel tweaks like increasing the max number of file
        descriptors, etc. 

      Cheers,
      Lincoln

          On Jun 19, 2015, at 10:36 AM, Sean wrote:

                I am looking to use Ceph using EC on a few leftover storage servers (36 disk supermicro servers with dual xeon sockets and around 256Gb of ram). I did a small test using one node and using the ISA library and noticed that the CPU load was pretty spikey for just normal operation.

                Does anyone have any experience running Ceph EC on around 216 to 270 4TB disks? I'm looking  to yield around 680 TB to 1PB if possible. just putting my feelers out there to see if anyone else has had any experience and looking for any guidance.

            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com