Re: Performance issues with small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thats correct.  We created 65k buckets, using two hex characters as the naming convention, then stored the files in each container based on their first two characters in the file name.  The end result was 20-50 files per bucket.  Once all of the buckets were created and files were being loaded, we still observed an increase in latency overtime.

Is there a way to disable indexing?  Or are there other settings you can suggest to attempt to speed this process up?


On Wed, Sep 4, 2013 at 5:21 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:
Just for clarification, distributing objects over lots of buckets isn't helping improve small object performance?

The degradation over time is similar to something I've seen in the past, with higher numbers of seeks on the underlying OSD device over time.  Is it always (temporarily) resolved writing to a new empty bucket?

Mark


On 09/04/2013 02:45 PM, Bill Omer wrote:
We've actually done the same thing, creating 65k buckets and storing
20-50 objects in each.  No change really, not noticeable anyway


On Wed, Sep 4, 2013 at 2:43 PM, Bryan Stillwell
<bstillwell@xxxxxxxxxxxxxxx <mailto:bstillwell@photobucket.com>> wrote:

    So far I haven't seen much of a change.  It's still working through
    removing the bucket that reached 1.5 million objects though (my
    guess is that'll take a few more days), so I believe that might have
    something to do with it.

    Bryan


    On Wed, Sep 4, 2013 at 12:14 PM, Mark Nelson
    <mark.nelson@xxxxxxxxxxx <mailto:mark.nelson@inktank.com>> wrote:

        Bryan,

        Good explanation.  How's performance now that you've spread the
        load over multiple buckets?

        Mark

        On 09/04/2013 12:39 PM, Bryan Stillwell wrote:

            Bill,

            I've run into a similar issue with objects averaging
            ~100KiB.  The
            explanation I received on IRC is that there are scaling
            issues if you're
            uploading them all to the same bucket because the index
            isn't sharded.
               The recommended solution is to spread the objects out to
            a lot of
            buckets.  However, that ran me into another issue once I hit
            1000
            buckets which is a per user limit.  I switched the limit to
            be unlimited
            with this command:

            radosgw-admin user modify --uid=your_username --max-buckets=0

            Bryan


            On Wed, Sep 4, 2013 at 11:27 AM, Bill Omer
            <bill.omer@xxxxxxxxx <mailto:bill.omer@xxxxxxxxx>
            <mailto:bill.omer@xxxxxxxxx <mailto:bill.omer@xxxxxxxxx>>>

            wrote:

                 I'm testing ceph for storing a very large number of
            small files.
                   I'm seeing some performance issues and would like to
            see if anyone
                 could offer any insight as to what I could do to
            correct this.

                 Some numbers:

                 Uploaded 184111 files, with an average file size of
            5KB, using
                 10 separate servers to upload the request using Python
            and the
                 cloudfiles module.  I stopped uploading after 53
            minutes, which
                 seems to average 5.7 files per second per node.


                 My storage cluster consists of 21 OSD's across 7
            servers, with their
                 journals written to SSD drives.  I've done a default
            installation,
                 using ceph-deploy with the dumpling release.

                 I'm using statsd to monitor the performance, and what's
            interesting
                 is when I start with an empty bucket, performance is
            amazing, with
                 average response times of 20-50ms.  However as time
            goes on, the
                 response times go in to the hundreds, and the average
            number of
                 uploads per second drops.

                 I've installed radosgw on all 7 ceph servers.  I've
            tested using a
                 load balancer to distribute the api calls, as well as
            pointing the
                 10 worker servers to a single instance.  I've not seen
            a real
                 different in performance with this either.


                 Each of the ceph servers are 16 core Xeon 2.53GHz with
            72GB of ram,
                 OCZ Vertex4 SSD drives for the journals and Seagate
            Barracuda ES2
                 drives for storage.


                 Any help would be greatly appreciated.


                 _________________________________________________

                 ceph-users mailing list
            ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>
            <mailto:ceph-users@xxxxxxxxxx.__com
            <mailto:ceph-users@xxxxxxxxxx.com>>
            http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

            <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




            --
            Photobucket <http://photobucket.com>

            *Bryan Stillwell*
            SENIOR SYSTEM ADMINISTRATOR

            E: bstillwell@xxxxxxxxxxxxxxx
            <mailto:bstillwell@photobucket.com>
            <mailto:bstillwell@__photobucket.com
            <mailto:bstillwell@photobucket.com>>
            O: 303.228.5109 <tel:303.228.5109>
            M: 970.310.6085 <tel:970.310.6085>

            Facebook <http://www.facebook.com/__photobucket
            <http://www.facebook.com/photobucket>>  Twitter
            <http://twitter.com/__photobucket
            <http://twitter.com/photobucket>>        Photobucket
            <http://photobucket.com/__images/photobucket
            <http://photobucket.com/images/photobucket>>




            _________________________________________________

            ceph-users mailing list
            ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>
            http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
            <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


        _________________________________________________

        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>
        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




    --
    Photobucket <http://photobucket.com>

    *Bryan Stillwell*

    SENIOR SYSTEM ADMINISTRATOR

    E: bstillwell@xxxxxxxxxxxxxxx <mailto:bstillwell@photobucket.com>
    O: 303.228.5109 <tel:303.228.5109>
    M: 970.310.6085 <tel:970.310.6085>


    Facebook <http://www.facebook.com/photobucket>      Twitter
    <http://twitter.com/photobucket>    Photobucket
    <http://photobucket.com/images/photobucket>



    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux