Re: SSD OSDs - more Cores or more GHz

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 20 Jan 2016 07:33:23 -0600

It depends is the right answer imho.  There are advantages to building 
smaller single-socket high frequency nodes.  The CPUs are cheap which 
helps off-set the lower density node cost, and as has been mentioned in 
this thread you don't have to deal with NUMA pinning and other annoying 
complications which ultimately can cost you more pain in the long run 
than it's worth.

On the other hand, if you are trying to squeeze as many SSDs into a 
single box as possible and know what you are doing regarding NUMA 
pinning, you'll probably benefit from dual CPUs with lots of cores.

Each kind of setup has it's place.  In our QA lab we just bought 
high-frequency single-core systems with a single P3700 NVMe to chew 
through the nightly Ceph testing.  We also have dual socket nodes with 
lots of cores, multiple NVMe drives, and multiple hard drives in the 
same box.

Mark

On 01/20/2016 07:14 AM, Wade Holler wrote:
Great commentary.

While it is fundamentally true that higher clock speed equals lower
latency, I'm my practical experience we are more often interested in
latency at the concurrency profile of the applications.

So in this regard I favor more cores when I have to choose, such that we
can support more concurrent operations at a queue depth of 0.

Cheers
Wade
On Wed, Jan 20, 2016 at 7:58 AM Jan Schermer <jan@xxxxxxxxxxx
<mailto:jan@xxxxxxxxxxx>> wrote:

    I'm using Ceph with all SSDs, I doubt you have to worry about speed that
    much with HDD (it will be abysmall either way).
    With SSDs you need to start worrying about processor caches and memory
    colocation in NUMA systems, linux scheduler is not really that smart
    right now.
    Yes, the process will get its own core, but it might be a different
    core every
    time it spins up, this increases latencies considerably if you start
    hammering
    the OSDs on the same host.

    But as always, YMMV ;-)

    Jan

     > On 20 Jan 2016, at 13:28, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx
    <mailto:info@xxxxxxxxxxxxxxxxx>> wrote:
     >
     > Hi Jan,
     >
     > actually the linux kernel does this automatically anyway (
    sending new
     > processes to "empty/low used" cores ).
     >
     > A single scrubbing/recovery or what ever process wont take more than
     > 100% CPU ( one core ) because technically this processes are not
    able to
     > run multi thread.
     >
     > Of course, if you configure your ceph to have ( up to ) 8 backfill
     > processes, then 8 processes will start, which can utilize ( up to ) 8
     > CPU cores.
     >
     > But still, the single process wont be able to use more than one
    cpu core.
     >
     > ---
     >
     > In a situation where you have 2x E5-2620v3 for example, you have 2x 6
     > Cores x 2 HT Units = 24 Threads ( vCores ).
     >
     > So if you use inside such a system 24 OSD's every OSD will have (
     > mathematically ) its "own" CPU Core automatically.
     >
     > Such a combination will perform better compared if you are using
    1x E5
     > CPU with a much higher frequency ( but still the same amout of
    cores ).
     >
     > This kind of CPU's are so fast, that the physical HDD ( no matter if
     > SAS/SSD/ATA ) will not be able to overload the cpu ( no matter
    which cpu
     > you use of this kind ).
     >
     > Its like if you are playing games. If the game is running smooth, it
     > does not matter if its running on a 4 GHz machine on 40%
    utilization or
     > on a 2 GHz machine with 80% utilization. Is running smooth, it
    can not
     > do better :-)
     >
     > So if your data is coming as fast as the HDD can physical deliver it,
     > its not important if the cpu runs with 2, 3, 4, 200 Ghz. Its
    already the
     > max of what the HDD can deliver.
     >
     > So as long as the HDD's dont get faster, the CPU's does not need
    to be
     > faster.
     >
     > The Ceph storage is usually just delivering data, not running a
     > commercial webserver/what ever beside that.
     >
     > So if you are deciding what CPU you have to choose, you only have to
     > think about how fast your HDD devices are. So that the CPU does not
     > become the bottleneck.
     >
     > And the more cores you have, the lower is the chance, that different
     > requests will block each other.
     >
     > ----
     >
     > So all in all, Core > Frequency, always. ( As long as you use
    fast/up to
     > date CPUs ). If you are using old cpu's, of course you have to
    make sure
     > that the performance of the cpu ( which does by the way not only
    depend
     > on the frequency ) is sufficient that its not breaking the HDD
    data output.
     >
     >
     >
     > --
     > Mit freundlichen Gruessen / Best regards
     >
     > Oliver Dzombic
     > IP-Interactive
     >
     > mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
     >
     > Anschrift:
     >
     > IP Interactive UG ( haftungsbeschraenkt )
     > Zum Sonnenberg 1-3
     > 63571 Gelnhausen
     >
     > HRB 93402 beim Amtsgericht Hanau
     > Geschäftsführung: Oliver Dzombic
     >
     > Steuer Nr.: 35 236 3622 1
     > UST ID: DE274086107
     >
     >
     > Am 20.01.2016 um 13:10 schrieb Jan Schermer:
     >> This is very true, but do you actually exclusively pin the cores
    to the OSD daemons so they don't interfere?
     >> I don't think may people do that, it wouldn't work with more
    than a handful of OSDs.
     >> The OSD might typicaly only need <100% of one core, but during
    startup or some reshuffling it's beneficial
     >> to allow it to get more (>400%), and that will interfere with
    whatever else was pinned there...
     >>
     >> Jan
     >>
     >>> On 20 Jan 2016, at 13:07, Oliver Dzombic
    <info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>> wrote:
     >>>
     >>> Hi,
     >>>
     >>> Cores > Frequency
     >>>
     >>> If you think about recovery / scrubbing tasks its better when a
    cpu core
     >>> can be assigned to do this.
     >>>
     >>> Compared to a situation where the same cpu core needs to
    recovery/scrub
     >>> and still deliver the productive content at the same time.
     >>>
     >>> The more you can create a situation where an osd has its "own"
    cpu core,
     >>> the better it is. Modern CPU's are anyway so fast, that even
    SSDs cant
     >>> run the CPU's to their limit.
     >>>
     >>> --
     >>> Mit freundlichen Gruessen / Best regards
     >>>
     >>> Oliver Dzombic
     >>> IP-Interactive
     >>>
     >>> mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
     >>>
     >>> Anschrift:
     >>>
     >>> IP Interactive UG ( haftungsbeschraenkt )
     >>> Zum Sonnenberg 1-3
     >>> 63571 Gelnhausen
     >>>
     >>> HRB 93402 beim Amtsgericht Hanau
     >>> Geschäftsführung: Oliver Dzombic
     >>>
     >>> Steuer Nr.: 35 236 3622 1
     >>> UST ID: DE274086107
     >>>
     >>>
     >>> Am 20.01.2016 um 10:01 schrieb Götz Reinicke - IT Koordinator:
     >>>> Hi folks,
     >>>>
     >>>> we plan to use more ssd OSDs in our first cluster layout
    instead of SAS
     >>>> osds. (more IO is needed than space)
     >>>>
     >>>> short question: What would influence the performance more?
    more Cores or
     >>>> more GHz/Core.
     >>>>
     >>>> Or is it as always: Depeds on the total of
    OSDs/nodes/repl-level/etc ... :)
     >>>>
     >>>> If needed, I can give some more detailed information on the
    layout.
     >>>>
     >>>>    Thansk for feedback . Götz
     >>>>
     >>>>
     >>>>
     >>>> _______________________________________________
     >>>> ceph-users mailing list
     >>>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     >>>>
     >>> _______________________________________________
     >>> ceph-users mailing list
     >>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     > _______________________________________________
     > ceph-users mailing list
     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com