Ah typo, I meant to say 10Mhz per IO. So a 7.2k disk does around 80IOPs = ~ 800mhz which is close to the 1Ghz figure. From: John Hogenmiller [mailto:john@xxxxxxxxxxxxxxx] Sent: 17 February 2016 13:15 To: Nick Fisk <nick@xxxxxxxxxx> Cc: Василий Ангапов <angapov@xxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx Subject: Re: Recomendations for building 1PB RadosGW with Erasure Code I hadn't come across this ratio prior, but now that I've read that PDF you linked and I've narrowed my search in the mailing list, I think that the 0.5 - 1ghz per OSD ratio is pretty spot on. The 100Mhz per IOP is also pretty interesting, and we do indeed use 7200 RPM drives. I'll look up a few more things, but based on what I've seen so far, the hardware we're using will most likely not be suitable, which is unfortunate as that adds some more complexity at OSI Level 8. :D On Wed, Feb 17, 2016 at 4:14 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: Thanks for posting your experiences John, very interesting read. I think the golden rule of around 1Ghz is still a realistic goal to aim for. It looks like you probably have around 16ghz for 60OSD's, or 0.26Ghz per OSD. Do you have any idea on how much CPU you think you would need to just be able to get away with it?
I have 24Ghz for 12 OSD's (2x2620v2) and I typically don't see CPU usage over about 20%, which indicates to me the bare minimum for a replicated pool is probably around 0.5Ghz per 7.2k rpm OSD. The next nodes we have will certainly have less CPU.
> -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > John Hogenmiller > Sent: 17 February 2016 03:04 > To: Василий Ангапов <angapov@xxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx > Subject: Re: Recomendations for building 1PB RadosGW with > Erasure Code > > Turns out i didn't do reply-all. > > On Tue, Feb 16, 2016 at 9:18 AM, John Hogenmiller <john@xxxxxxxxxxxxxxx> > wrote: > > And again - is dual Xeon's power enough for 60-disk node and Erasure > Code? > > > This is something I've been attempting to determine as well. I'm not yet > getting > I'm testing with some white-label hardware, but essentially supermicro > 2twinu's with a pair of E5-2609 Xeons and 64GB of > memory. (http://www.supermicro.com/products/system/2U/6028/SYS- > 6028TR-HTFR.cfm). This is attached to DAEs with 60 x 6TB drives, in JBOD. > > Conversely, Supermicro sells a 72-disk OSD node, which Redhat considers a > supported "reference architecture" device. The processors in those nodes > are E5-269 12-core, vs what I have which is quad- > core. http://www.supermicro.com/solutions/storage_ceph.cfm (SSG- > 6048R-OSD432). I would highly recommend reflecting on the supermicro > hardware and using that as your reference as well. If you could get an eval > unit, use that to compare with the hardware you're working with. > > I currently have mine setup with 7 nodes, 60 OSDs each, radosgw running > one each node, and 5 ceph monitors. I plan to move the monitors to their > own dedicated hardware, and in reading, I may only need 3 to manage the > 420 OSDs. I am currently just setup for replication instead of EC, though I > want to redo this cluster to use EC. Also, I am still trying to work out how > much of an impact placement groups have on performance, and I may have a > performance-hampering amount.. > > We test the system using locust speaking S3 to the radosgw. Transactions are > distributed equally across all 7 nodes and we track the statistics. We started > first emulating 1000 users and got over 4Gbps, but load average on all nodes > was in the mid-100s, and after 15 minutes we started getting socket > timeouts. We stopped the test, let load settle, and started back at 100 > users. We've been running this test about 5 days now. Load average on all > nodes floats between 40 and 70. The nodes with ceph-mon running on them > do not appear to be taxed any more than the ones without. The radosgw > itself seems to take up a decent amount of cpu (running civetweb, no > ssl). iowait is non existent, everything appears to be cpu bound. > > At 1000 users, we had 4.3Gbps of PUTs and 2.2Gbps of GETs. Did not capture > the TPS on that short test. > At 100 users, we're pushing 2Gbps in PUTs and 1.24Gpbs in GETs. Averaging > 115 TPS. > > All in all, the speeds are not bad for a single rack, but the CPU utilization is a > big concern. We're currently using other (proprietary) object storage > platforms on this hardware configuration. They have their own set of issues, > but CPU utilization is typically not the problem, even at higher utilization. > > > > root@ljb01:/home/ceph/rain-cluster# ceph status > cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45 > health HEALTH_OK > monmap e5: 5 mons at {hail02-r01-06=172.29.4.153:6789/0,hail02-r01- > 08=172.29.4.155:6789/0,rain02-r01-01=172.29.4.148:6789/0,rain02-r01- > 03=172.29.4.150:6789/0,rain02-r01-04=172.29.4.151:6789/0} > election epoch 86, quorum 0,1,2,3,4 rain02-r01-01,rain02-r01-03,rain02- > r01-04,hail02-r01-06,hail02-r01-08 > osdmap e2543: 423 osds: 419 up, 419 in > flags sortbitwise > pgmap v676131: 33848 pgs, 14 pools, 50834 GB data, 29660 kobjects > 149 TB used, 2134 TB / 2284 TB avail > 33848 active+clean > client io 129 MB/s rd, 182 MB/s wr, 1562 op/s > > > > # ceph-osd + ceph-mon + radosgw > top - 13:29:22 up 40 days, 22:05, 1 user, load average: 47.76, 47.33, 47.08 > Tasks: 1001 total, 7 running, 994 sleeping, 0 stopped, 0 zombie > %Cpu(s): 39.2 us, 44.7 sy, 0.0 ni, 9.9 id, 2.4 wa, 0.0 hi, 3.7 si, 0.0 st > KiB Mem: 65873180 total, 64818176 used, 1055004 free, 9324 buffers > KiB Swap: 8388604 total, 7801828 used, 586776 free. 17610868 cached Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 178129 ceph 20 0 3066452 618060 5440 S 54.6 0.9 2678:49 ceph- > osd > 218049 ceph 20 0 6261880 179704 2872 S 33.4 0.3 1852:14 radosgw > 165529 ceph 20 0 2915332 579064 4308 S 19.7 0.9 530:12.65 ceph-osd > 185193 ceph 20 0 2932696 585724 4412 S 19.1 0.9 545:20.31 ceph-osd > 52334 ceph 20 0 3030300 618868 4328 S 15.8 0.9 543:53.64 ceph-osd > 23124 ceph 20 0 3037740 607088 4440 S 15.2 0.9 461:03.98 ceph-osd > 154031 ceph 20 0 2982344 525428 4044 S 14.9 0.8 587:17.62 ceph-osd > 191278 ceph 20 0 2835208 570100 4700 S 14.9 0.9 547:11.66 ceph-osd > > # ceph-osd + radosgw (no ceph-mon) > > top - 13:31:22 up 40 days, 22:06, 1 user, load average: 64.25, 59.76, 58.17 > Tasks: 1015 total, 4 running, 1011 sleeping, 0 stopped, 0 zombie > %Cpu0 : 24.2 us, 48.5 sy, 0.0 ni, 10.9 id, 1.2 wa, 0.0 hi, 15.2 si, 0.0 st > %Cpu1 : 30.8 us, 49.7 sy, 0.0 ni, 13.5 id, 1.8 wa, 0.0 hi, 4.2 si, 0.0 st > %Cpu2 : 33.9 us, 49.5 sy, 0.0 ni, 10.5 id, 2.1 wa, 0.0 hi, 3.9 si, 0.0 st > %Cpu3 : 31.3 us, 52.4 sy, 0.0 ni, 10.5 id, 2.7 wa, 0.0 hi, 3.0 si, 0.0 st > %Cpu4 : 34.7 us, 41.3 sy, 0.0 ni, 20.4 id, 3.3 wa, 0.0 hi, 0.3 si, 0.0 st > %Cpu5 : 38.3 us, 36.7 sy, 0.0 ni, 21.0 id, 3.7 wa, 0.0 hi, 0.3 si, 0.0 st > %Cpu6 : 36.6 us, 37.5 sy, 0.0 ni, 19.8 id, 6.1 wa, 0.0 hi, 0.0 si, 0.0 st > %Cpu7 : 38.0 us, 38.0 sy, 0.0 ni, 19.5 id, 4.3 wa, 0.0 hi, 0.3 si, 0.0 st > KiB Mem: 65873180 total, 61946688 used, 3926492 free, 1260 buffers > KiB Swap: 8388604 total, 7080048 used, 1308556 free. 7910772 cached Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 108861 ceph 20 0 6283364 279464 3024 S 27.6 0.4 1684:30 radosgw > 546 root 20 0 0 0 0 R 23.4 0.0 579:55.28 kswapd0 > 184953 ceph 20 0 2971100 576784 4348 S 23.1 0.9 1265:58 ceph-osd > 178967 ceph 20 0 2970500 594756 6000 S 18.9 0.9 505:27.13 ceph-osd > 184105 ceph 20 0 3096276 627944 7096 S 18.0 1.0 581:12.28 ceph-osd > 56073 ceph 20 0 2888244 542024 4836 S 13.5 0.8 530:41.86 ceph-osd > 55083 ceph 20 0 2819060 518500 5052 S 13.2 0.8 513:21.96 ceph-osd > 175578 ceph 20 0 3083096 630712 5564 S 13.2 1.0 591:10.91 ceph-osd > 180725 ceph 20 0 2915828 553240 4836 S 12.9 0.8 518:22.47 ceph- > osd >
|