Re: Ceph random read IOPS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017-06-26 15:34, Willem Jan Withagen wrote:

On 26-6-2017 09:01, Christian Wuerdig wrote:
Well, preferring faster clock CPUs for SSD scenarios has been floated
several times over the last few months on this list. And realistic or
not, Nick's and Kostas' setup are similar enough (testing single disk)
that it's a distinct possibility.
Anyway, as mentioned measuring the performance counters would probably
provide more insight.

I read the advise as:
    prefer GHz over cores.

And especially since there is a sort of balance between either GHz or
cores, that can be an expensive one. Getting both means you have to pay
relatively substantial more money.

And for an average Ceph server with plenty OSDs, I personally just don't
buy that. There you'd have to look at the total throughput of the the
system, and latency is only one of the many factors.

Let alone in a cluster with several hosts (and or racks). There the
latency is dictated by the network. So a bad choice of network card or
switch will out do any extra cycles that your CPU can burn.

I think that just testing 1 OSD is testing artifacts, and has very
little to do with running an actual ceph cluster.

So if one would like to test this, the test setup should be something
like: 3 hosts with something like 3 disks per host, min_disk=2  and a
nice workload.
Then turn the GHz-knob and see what happens with client latency and
throughput.

--WjW
 
 
In a high concurrency/queue depth situation, which is probably the most common workload, there is no question that adding more cores will increase IOPS almost linearly in case you have enough disk and network bandwidth, ie your disk and network % utilization is low and your cpu is near 100%. Adding more cores is also more economic to increase IOPS versus increasing frequency.
But adding more cores will not lower latency below the value you get from the QD=1 test. To achieve lower latency you need faster cpu freq. Yes it is expensive and as you said you need lower latency switches and so on but you just have to pay more to achieve this. 
 
/Maged

On Sun, Jun 25, 2017 at 4:53 AM, Willem Jan Withagen <wjw@xxxxxxxxxxx
<mailto:wjw@xxxxxxxxxxx>> wrote:



    Op 24 jun. 2017 om 14:17 heeft Maged Mokhtar <mmokhtar@xxxxxxxxxxx
    <mailto:mmokhtar@xxxxxxxxxxx>> het volgende geschreven:

    My understanding was this test is targeting latency more than
    IOPS. This is probably why its was run using QD=1. It also makes
    sense that cpu freq will be more important than cores.


    But then it is not generic enough to be used as an advise!
    It is just a line in 3D-space.
    As there are so many

    --WjW

    On 2017-06-24 12:52, Willem Jan Withagen wrote:

    On 24-6-2017 05:30, Christian Wuerdig wrote:
    The general advice floating around is that your want CPUs with high
    clock speeds rather than more cores to reduce latency and
    increase IOPS
    for SSD setups (see also
    http://www.sys-pro.co.uk/ceph-storage-fast-cpus-ssd-performance/
    <http://www.sys-pro.co.uk/ceph-storage-fast-cpus-ssd-performance/>)
    So
    something like a E5-2667V4 might bring better results in that
    situation.
    Also there was some talk about disabling the processor C states
    in order
    to bring latency down (something like this should be easy to test:
    https://stackoverflow.com/a/22482722/220986
    <https://stackoverflow.com/a/22482722/220986>)

    I would be very careful to call this a general advice...

    Although the article is interesting, it is rather single sided.

    The only thing is shows that there is a lineair relation between
    clockspeed and write or read speeds???
    The article is rather vague on how and what is actually tested.

    By just running a single OSD with no replication a lot of the
    functionality is left out of the equation.
    Nobody is running just 1 osD on a box in a normal cluster host.

    Not using a serious SSD is another source of noise on the conclusion.
    More Queue depth can/will certainly have impact on concurrency.

    I would call this an observation, and nothing more.

    --WjW

    On Sat, Jun 24, 2017 at 1:28 AM, Kostas Paraskevopoulos
    <reverend.x3@xxxxxxxxx <mailto:reverend.x3@xxxxxxxxx>
    <mailto:reverend.x3@xxxxxxxxx <mailto:reverend.x3@xxxxxxxxx>>>
    wrote:

        Hello,

        We are in the process of evaluating the performance of a testing
        cluster (3 nodes) with ceph jewel. Our setup consists of:
        3 monitors (VMs)
        2 physical servers each connected with 1 JBOD running Ubuntu
    Server
        16.04

        Each server has 32 threads @2.1GHz and 128GB RAM.
        The disk distribution per server is:
        38 * HUS726020ALS210 (SAS rotational)
        2 * HUSMH8010BSS200 (SAS SSD for journals)
        2 * ST1920FM0043 (SAS SSD for data)
        1 * INTEL SSDPEDME012T4 (NVME measured with fio ~300K iops)

        Since we don't currently have a 10Gbit switch, we test the
    performance
        with the cluster in a degraded state, the noout flag set and
    we mount
        rbd images on the powered on osd node. We confirmed that the
    network
        is not saturated during the tests.

        We ran tests on the NVME disk and the pool created on this
    disk where
        we hoped to get the most performance without getting limited
    by the
        hardware specs since we have more disks than CPU threads.

        The nvme disk was at first partitioned with one partition
    and the
        journal on the same disk. The performance on random 4K reads was
        topped at 50K iops. We then removed the osd and partitioned
    with 4
        data partitions and 4 journals on the same disk. The performance
        didn't increase significantly. Also, since we run read
    tests, the
        journals shouldn't cause performance issues.

        We then ran 4 fio processes in parallel on the same rbd
    mounted image
        and the total iops reached 100K. More parallel fio processes
    didn't
        increase the measured iops.

        Our ceph.conf is pretty basic (debug is set to 0/0 for
    everything) and
        the crushmap just defines the different buckets/rules for
    the disk
        separation (rotational, ssd, nvme) in order to create the
    required
        pools

        Is the performance of 100.000 iops for random 4K read normal
    for a
        disk that on the same benchmark runs at more than 300K iops
    on the
        same hardware or are we missing something?

        Best regards,
        Kostas
        _______________________________________________
        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    <mailto:ceph-users@xxxxxxxxxxxxxx
    <mailto:ceph-users@xxxxxxxxxxxxxx>>
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>




    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

     

     


 

 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux