Re: Ceph SSD CPU Frequency Benchmarks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think this may be related to what I had to do, it rings a bell at least.

http://unix.stackexchange.com/questions/153693/cant-use-userspace-cpufreq-governor-and-set-cpu-frequency

The P-state drive doesn't support userspace, so you need to disable it and make Linux use the old acpi drive instead.

> -----Original Message-----
> From: Nick Fisk [mailto:nick@xxxxxxxxxx]
> Sent: 01 September 2015 22:21
> To: 'Robert LeBlanc' <robert@xxxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: RE:  Ceph SSD CPU Frequency Benchmarks
> 
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> > Of Robert LeBlanc
> > Sent: 01 September 2015 21:48
> > To: Nick Fisk <nick@xxxxxxxxxx>
> > Cc: ceph-users@xxxxxxxxxxxxxx
> > Subject: Re:  Ceph SSD CPU Frequency Benchmarks
> >
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> >
> > Nick,
> >
> > I've been trying to replicate your results without success. Can you
> > help me understand what I'm doing that is not the same as your test?
> >
> > My setup is two boxes, one is a client and the other is a server. The
> > server has Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz, 32 GB RAM and 2
> > Intel S3500
> > 240 GB SSD drives. The boxes have Infiniband FDR cards connected to a
> > QDR switch using IPoIB. I set up OSDs on the 2 SSDs and set pool
> > size=1. I mapped a 200GB RBD using the kernel module ran fio on the
> > RBD. I adjusted the number of cores, clock speed and C-states of the
> > server and here are my
> > results:
> >
> > Adjusted core number and set the processor to a set frequency using
> > the userspace governor.
> >
> > 8 jobs 8 depth   Cores
> >                   1    2     3     4     5     6     7     8
> > Frequency  2.4  387  762  1121  1432  1657  1900  2092  2260
> > GHz        2    386  758  1126  1428  1657  1890  2090  2232
> >            1.6  382  756  1127  1428  1656  1894  2083  2201
> >            1.2  385  756  1125  1431  1656  1885  2093  2244
> >
> 
> I tested at QD=1 as this tends to highlight the difference in clock speed,
> whereas a higher queue depth will probably scale with both frequency and
> cores. I'm not sure this is your problem, but to make sure your environment
> is doing what you want I would suggest QD=1 and 1 job to start with.
> 
> But thank you for sharing these results regardless of your current frequency
> scaling issues. Information like this is really useful for people trying to decide
> on hardware purchases. Those Atom boards look like they could support 12x
> normal HDD's quite happily, assuming 80 IOPsx12.
> 
> I wonder if we can get enough data from various people to generate a
> IOPs/CPU Freq for various CPU architectures?
> 
> 
> > I then adjusted the processor to not go in a deeper sleep state than
> > C1 and also tested setting the highest CPU frequency with the ondemand
> governor.
> >
> > 1 job 1 depth
> > Cores  1
> >               <=C1, feq range  C0-C6, freq range  C0-C6, static freq	<=C1, static
> > freq
> > Frequency 2.4  381             381                379                 381
> > GHz       2    382             380                381                 381
> >           1.6  380             381                379                 382
> >           1.2  383             378                379                 383
> > Cores  8
> >               <=C1, feq range  C0-C6, freq range  C0-C6, static freq	<=C1, static
> > freq
> > Frequency 2.4  629             580                584                 629
> > GHz       2    630             579                584                 634
> >           1.6  630             579                584                 634
> >           1.2  632             581                582                 634
> >
> > Here I'm see a correlation between # cores and C-states, but not
> frequency.
> >
> > Frequency was controlled with:
> > cpupower frequency-set -d 1.2GHz -u 1.2GHz -g userspace and cpupower
> > frequency-set -d 1.2GHz -u 2.0GHz -g ondemand
> >
> > Core count adjusted by:
> > for i in {1..7}; do echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > done
> >
> > C-states controlled by:
> > # python
> > Python 2.7.5 (default, Jun 24 2015, 00:41:19) [GCC 4.8.3 20140911 (Red
> > Hat 4.8.3-9)] on linux2 Type "help", "copyright", "credits" or
> > "license" for more information.
> > >>> fd = open('/dev/cpu_dma_latency','wb')
> > >>> fd.write('1')
> > >>> fd.flush()
> > >>> fd.close() # Don't run this until the tests are completed (the
> > >>> handle has
> > to stay open).
> > >>>
> >
> > I'd like to replicate your results. I'd also like if you can verify
> > some of mine in your set-up around C-States and cores.
> 
> I can't remember exactly, but I think I had to do something to get the
> userspace governor to behave as I expected it to. I tend to recall setting the
> frequency low and yet still seeing it bursting up to max. I will have a look
> through my notes tomorrow and see if I can recall anything. One thing I do
> remember though is that the Intel powertop utility was very useful in
> confirming what the actual CPU frequency was. It might be worth installing
> and running this and seeing what the CPU cores are doing.
> 
> 
> >
> > Thanks,
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: Mailvelope v1.0.2
> > Comment: https://www.mailvelope.com
> >
> >
> wsFcBAEBCAAQBQJV5g8GCRDmVDuy+mK58QAAe6YP/j+SNGFI2z7ndnbOk87
> > D
> > UjxG+hiZT5bkdt2/wVfI6QiH0UGDA3rLBsttOHPgfxP6/CEy801q8/fO0QOk
> > tLxIgX01K4ECls2uhiFAM3bhKalFsKDM6rHYFx96tIGWonQeou36ouDG8pfz
> > YsprvQ2XZEX1+G4dfZZ4lc3A3mfIY6Wsn7DC0tup9eRp3cl9hQLXEu4Zg8CZ
> > 7867FNaud4S4f6hYV0KUC0fv+hZvyruMCt/jgl8gVr8bAdNgiW5u862gsk5b
> > sO9mb7H679G8t47m3xd89jTh9siMshbcakF9PXKzrN7DxBb/sBuN3GykesZA
> > +5jdUTzPCxFu+LocJ91by8FybatpLwxycmfP2gRxd/owclXk5BqqJUnrdYVm
> >
> n2GcHobdHVv9k/s+iBVV0xbwqOY+IO9UNUfLAKNy7E1xtpXdTpQBuokmu/4D
> >
> WXg3C4u+DsZNvcziO4s/edQ1koOQm1Fcj5VnbouSqmsHpB5nHeJbGmiKNTB
> > A
> > 9pE/hTph56YRqOE3bq3X/ohjtziL7/e/MVF3VUisDJieaLxV9weLxKIf0W9t
> > L7NMhX7iUIMps5ulA9qzd8qJK6yBa65BVXtk5M0A5oTA/VvxHQT6e5nSZS+Z
> >
> WLjavMnmSSJT1BQZ5GkVbVqo4UVjndcXEvkBm3+McaGKliO2xvxP+U3nCKpZ
> > js+h
> > =4WAa
> > -----END PGP SIGNATURE-----
> >
> >
> > ----------------
> > Robert LeBlanc
> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >
> > On Sat, Jun 13, 2015 at 8:58 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> > Hi All,
> >
> > I know there has been lots of discussions around needing fast CPU's to
> > get the most out of SSD's. However I have never really ever seen an
> > solid numbers to make a comparison about how much difference a faster
> > CPU makes and if Ceph scales linearly with clockspeed. So I did a
> > little experiment today.
> >
> > I setup a 1 OSD Ceph instance on a Desktop PC. The Desktop has a i5
> > Sandbybridge CPU with the CPU turbo overclocked to 4.3ghz. By using
> > the userspace governor in Linux, I was able to set static clock speeds
> > to see the possible performance effects on Ceph. My pc only has an old
> > X25M-G2 SSD, so I had to limit the IO testing to 4kb QD=1, as
> > otherwise the SSD ran out of puff when I got to the higher clock
> > speeds.
> >
> > CPU Mhz 4Kb Write IO    Min Latency (us)        Avg Latency (us)        CPU
> > usr     CPU sys
> > 1600            797             886                     1250
> > 10.14           2.35
> > 2000            815             746                     1222
> > 8.45            1.82
> > 2400            1161            630                     857
> > 9.5             1.6
> > 2800            1227            549                     812
> > 8.74            1.24
> > 3300            1320            482                     755
> > 7.87            1.08
> > 4300            1548            437                     644
> > 7.72            0.9
> >
> > The figures show a fairly linear trend right through the clock range
> > and clearly shows the importance of having fast CPU's (Ghz not cores)
> > if you want to achieve high IO, especially at low queue depths.
> >
> >
> > Things to Note
> > These figures are from a desktop CPU, no doubt Xeons will be slightly
> > faster at the same clock speed I assuming using the userspace governor
> > in this way is a realistic way to simulate different CPU clock speeds?
> > My old SSD is probably skewing the figures slightly I have complete
> > control over the turbo settings and big cooling, many server CPU's
> > will limit the max turbo if multiple cores are under load or get too
> > hot Ceph SSD OSD nodes are probably best with high end E3 CPU's as
> > they have the highest clock speeds HDD's with Journals will probably
> > benefit slightly from higher clock speeds, if the disk isn't the
> > bottleneck (ie small block sequential writes) These numbers are for
> > Replica=1, at 2 or 3 these numbers will be at least half I would
> > imagine
> >
> >
> > I hope someone finds this useful
> >
> > Nick
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux