Re: Recomendations for building 1PB RadosGW with Erasure Code

John Hogenmiller <john@xxxxxxxxxxxxxxx> · Tue, 16 Feb 2016 22:04:19 -0500

Turns out i didn't do reply-all.

On Tue, Feb 16, 2016 at 9:18 AM, John Hogenmiller <john@xxxxxxxxxxxxxxx> wrote:
> And again - is dual Xeon's power enough for 60-disk node and Erasure Code?

This is something I've been attempting to determine as well. I'm not yet getting 
I'm testing with some white-label hardware, but essentially supermicro 2twinu's with a pair of E5-2609 Xeons and 64GB of memory.  (http://www.supermicro.com/products/system/2U/6028/SYS-6028TR-HTFR.cfm). This is attached to DAEs with 60 x 6TB drives, in JBOD.

Conversely, Supermicro sells a 72-disk OSD node, which Redhat considers a supported "reference architecture" device. The processors in those nodes are E5-269 12-core, vs what I have which is quad-core. http://www.supermicro.com/solutions/storage_ceph.cfm  (SSG-6048R-OSD432). I would highly recommend reflecting on the supermicro hardware and using that as your reference as well. If you could get an eval unit, use that to compare with the hardware you're working with. 

I currently have mine setup with 7 nodes, 60 OSDs each, radosgw running one each node, and 5 ceph monitors. I plan to move the monitors to their own dedicated hardware, and in reading, I may only need 3 to manage the 420 OSDs.   I am currently just setup for replication instead of EC, though I want to redo this cluster to use EC. Also, I am still trying to work out how much of an impact placement groups have on performance, and I may have a performance-hampering amount..

We test the system using locust speaking S3 to the radosgw. Transactions are distributed equally across all 7 nodes and we track the statistics. We started first emulating 1000 users and got over 4Gbps, but load average on all nodes was in the mid-100s, and after 15 minutes we started getting socket timeouts. We stopped the test, let load settle, and started back at 100 users.  We've been running this test about 5 days now.  Load average on all nodes floats between 40 and 70. The nodes with ceph-mon running on them do not appear to be taxed any more than the ones without. The radosgw itself seems to take up a decent amount of cpu (running civetweb, no ssl).  iowait is non existent, everything appears to be cpu bound.

At 1000 users, we had 4.3Gbps of PUTs and 2.2Gbps of GETs. Did not capture the TPS on that short test.
At 100 users, we're pushing 2Gbps  in PUTs and 1.24Gpbs in GETs. Averaging 115 TPS. 

All in all, the speeds are not bad for a single rack, but the CPU utilization is a big concern. We're currently using other (proprietary) object storage platforms on this hardware configuration. They have their own set of issues, but CPU utilization is typically not the problem, even at higher utilization. 

root@ljb01:/home/ceph/rain-cluster# ceph status
    cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
     health HEALTH_OK
     monmap e5: 5 mons at {hail02-r01-06=172.29.4.153:6789/0,hail02-r01-08=172.29.4.155:6789/0,rain02-r01-01=172.29.4.148:6789/0,rain02-r01-03=172.29.4.150:6789/0,rain02-r01-04=172.29.4.151:6789/0}
            election epoch 86, quorum 0,1,2,3,4 rain02-r01-01,rain02-r01-03,rain02-r01-04,hail02-r01-06,hail02-r01-08
     osdmap e2543: 423 osds: 419 up, 419 in
            flags sortbitwise
      pgmap v676131: 33848 pgs, 14 pools, 50834 GB data, 29660 kobjects
            149 TB used, 2134 TB / 2284 TB avail
               33848 active+clean
  client io 129 MB/s rd, 182 MB/s wr, 1562 op/s

 # ceph-osd + ceph-mon + radosgwtop - 13:29:22 up 40 days, 22:05,  1 user,  load average: 47.76, 47.33, 47.08
Tasks: 1001 total,   7 running, 994 sleeping,   0 stopped,   0 zombie
%Cpu(s): 39.2 us, 44.7 sy,  0.0 ni,  9.9 id,  2.4 wa,  0.0 hi,  3.7 si,  0.0 st
KiB Mem:  65873180 total, 64818176 used,  1055004 free,     9324 buffers
KiB Swap:  8388604 total,  7801828 used,   586776 free. 17610868 cached Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                               
 178129 ceph      20   0 3066452 618060   5440 S  54.6  0.9   2678:49 ceph-osd                              
 218049 ceph      20   0 6261880 179704   2872 S  33.4  0.3   1852:14 radosgw 
 165529 ceph      20   0 2915332 579064   4308 S  19.7  0.9 530:12.65 ceph-osd 
 185193 ceph      20   0 2932696 585724   4412 S  19.1  0.9 545:20.31 ceph-osd 
  52334 ceph      20   0 3030300 618868   4328 S  15.8  0.9 543:53.64 ceph-osd 
  23124 ceph      20   0 3037740 607088   4440 S  15.2  0.9 461:03.98 ceph-osd 
 154031 ceph      20   0 2982344 525428   4044 S  14.9  0.8 587:17.62 ceph-osd 
 191278 ceph      20   0 2835208 570100   4700 S  14.9  0.9 547:11.66 ceph-osd 

 # ceph-osd + radosgw (no ceph-mon)

 top - 13:31:22 up 40 days, 22:06,  1 user,  load average: 64.25, 59.76, 58.17
Tasks: 1015 total,   4 running, 1011 sleeping,   0 stopped,   0 zombie
%Cpu0  : 24.2 us, 48.5 sy,  0.0 ni, 10.9 id,  1.2 wa,  0.0 hi, 15.2 si,  0.0 st
%Cpu1  : 30.8 us, 49.7 sy,  0.0 ni, 13.5 id,  1.8 wa,  0.0 hi,  4.2 si,  0.0 st
%Cpu2  : 33.9 us, 49.5 sy,  0.0 ni, 10.5 id,  2.1 wa,  0.0 hi,  3.9 si,  0.0 st
%Cpu3  : 31.3 us, 52.4 sy,  0.0 ni, 10.5 id,  2.7 wa,  0.0 hi,  3.0 si,  0.0 st
%Cpu4  : 34.7 us, 41.3 sy,  0.0 ni, 20.4 id,  3.3 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu5  : 38.3 us, 36.7 sy,  0.0 ni, 21.0 id,  3.7 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu6  : 36.6 us, 37.5 sy,  0.0 ni, 19.8 id,  6.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 38.0 us, 38.0 sy,  0.0 ni, 19.5 id,  4.3 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem:  65873180 total, 61946688 used,  3926492 free,     1260 buffers
KiB Swap:  8388604 total,  7080048 used,  1308556 free.  7910772 cached Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                               
 108861 ceph      20   0 6283364 279464   3024 S  27.6  0.4   1684:30 radosgw   
    546 root      20   0       0      0      0 R  23.4  0.0 579:55.28 kswapd0 
 184953 ceph      20   0 2971100 576784   4348 S  23.1  0.9   1265:58 ceph-osd 
 178967 ceph      20   0 2970500 594756   6000 S  18.9  0.9 505:27.13 ceph-osd
 184105 ceph      20   0 3096276 627944   7096 S  18.0  1.0 581:12.28 ceph-osd
  56073 ceph      20   0 2888244 542024   4836 S  13.5  0.8 530:41.86 ceph-osd
  55083 ceph      20   0 2819060 518500   5052 S  13.2  0.8 513:21.96 ceph-osd
 175578 ceph      20   0 3083096 630712   5564 S  13.2  1.0 591:10.91 ceph-osd
 180725 ceph      20   0 2915828 553240   4836 S  12.9  0.8 518:22.47 ceph-osd                                                                                             

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com