Re: Ceph Performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mark,

OK this is probably getting into the low-level details of the OSD protocol now, but i've been testing with tgt today to see if there is a difference, but unfortunately not.

I did, however, notice that with only a single RBD image, tgt has only 12 OSD connections (TCP) - presumably 1 per OSD?

Does this mean that even if I increase tgt's thread count that that it will only be able to do 12 parallel IO's (because that's how many OSDs I have?)

This might explain why the performance is not so good - on each connection it can only do 1 transaction at a time:

1) Submit write
2) wait...
3) Receive ACK

Then repeat...

But if the OSD protocol supports multiple transactions it could do something like this:

1) Submit 1
2) Submit 2
3) Submit 3
4) Recv ACK 1
5) Submit 4
6) Recv ACK 2

etc...

Of course I may be making totally wrong assumptions, but if this is the case then it would surely provide better performance if there were more OSD's, as my tests are using an IO depth of 256.

Regards
--
Brad.



On 10 January 2014 14:32, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:
On 01/10/2014 03:08 AM, Bradley Kite wrote:
On 9 January 2014 16:57, Mark Nelson <mark.nelson@xxxxxxxxxxx
<mailto:mark.nelson@inktank.com>> wrote:

    On 01/09/2014 10:43 AM, Bradley Kite wrote:

        On 9 January 2014 15:44, Christian Kauhaus <kc@xxxxxxxxxx
        <mailto:kc@xxxxxxxxxx>
        <mailto:kc@xxxxxxxxxx <mailto:kc@xxxxxxxxxx>>> wrote:

             Am 09.01.2014 10:25, schrieb Bradley Kite:
              > 3 servers (quad-core CPU, 16GB RAM), each with 4 SATA
        7.2K RPM
             disks (4TB)
              > plus a 160GB SSD.
              > [...]
              > By comparison, a 12-disk RAID5 iscsi SAN is doing ~4000
        read iops
             and ~2000
              > iops write (but with 15KRPM SAS disks).

             I think that comparing Ceph on 7.2k rpm SATA disks against
        iSCSI on
             15k rpm
             SAS disks is not fair. The random access times of 15k SAS
        disks are
             hugely
             better compared to 7.2k SATA disks. What would be far more
             interesting is to
             compare Ceph against iSCSI with identical disks.

             Regards

             Christian

             --
             Dipl.-Inf. Christian Kauhaus <>< · kc@xxxxxxxxxx
        <mailto:kc@xxxxxxxxxx>
             <mailto:kc@xxxxxxxxxx <mailto:kc@xxxxxxxxxx>> · systems

        administration

             gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle
        (Saale) · Germany
        http://gocept.com · tel +49 345 219401-11
        <tel:%2B49%20345%20219401-11> <tel:%2B49%20345%20219401-11>


             Python, Pyramid, Plone, Zope · consulting, development,
        hosting,
             operations
             _________________________________________________

             ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>
        <mailto:ceph-users@xxxxxxxxxx.__com
        <mailto:ceph-users@xxxxxxxxxx.com>>

        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


        Hi Christian,

        Yes, for a true comparison it would be better but this is the
        only iscsi
        SAN that we have available for testing, so I really only compared
        against it to get a "gut feel" for relative performance.

        I'm still looking for clues that might indicate why there is
        such a huge
        difference between the read & write rates on the ceph cluster
        though.


    One thing you may want to look at is some comparisons we did with
    fio on different RBD volumes with varying io depths and volume/guest
    counts:

    http://ceph.com/performance-2/__ceph-cuttlefish-vs-bobtail-__part-2-4k-rbd-performance/

    <http://ceph.com/performance-2/ceph-cuttlefish-vs-bobtail-part-2-4k-rbd-performance/>

    You'll probably be most interested in the 4k random read/write
    results for XFS.  It would be interesting to see if you saw any
    difference with more or less volumes at different io depths.  Also,
    sorry if I missed it, but is this QEMU/KVM?  If so, did you enable
    RBD cache?



Hi Mark,

Thanks for your very detailed test results.

Your results are interesting, and suggest that there is a significant
performance difference between kernel RBD mapping and QEMU/KVM (which
uses librbd directly) - shown particularly here where KRBD achieves
23MB/sec vs librbd 500 MB/sec:

http://ceph.com/wp-content/uploads/2014/07/cuttlefish-rbd_xfs-write-0004K.png

Particularly for small sequential IO with the direct flag, RBD cache does help dramatically as it allow all of those little sequential IOs to be coalesced.  I suspect you will notice some benefit with random direct IO (assuming you are using moderate sized VM images), but not nearly as much.  Buffered IO is interesting because now you have the linux buffer cache involved and the results may be quite different.



Our end-goal is to use QEMU/KVM so this is very promising. Would you
happen to have the raw iops/second figures from your tests? The graphs
only show throughput which provides a good comparison but for us IOPS is
the most important factor.

I'll have to check and see if I can get you the original data.  You can manually figure out the iops too by just taking the MB/s throughput and the io size and dividing:  X MB/s * 1024 / 4KB = Y IOPS (4KB)



Would you happen to know if stgt (iscsi) uses the kernel module or
librbd? We also have some legacy HyperV hosts that we would like to
connect (to avoid rebuilding them).

Not sure. I suspect it's not using any of the kernel code, but someone else can probably say for sure.



Is it generally recommended to avoid the kernel module where possible?

Not necessarily, but there are trade-offs.  The kernel module may be faster in some ways as it's more bare-metal, but it's also much more difficult to implement things like write back caching and other features in it.  I would stick with the QEMU driver if you can use it, but the kernel module may have advantages in some situations (say if you do entirely sequential large reads/writes).


Regards
--
Brad.




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux