Re: Performance issues with small files

Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> · Thu, 5 Sep 2013 11:29:32 -0600

I need to restart the upload process again because all the objects
have a content-type of 'binary/octet-stream' instead of 'image/jpeg',
'image/png', etc.  I plan on enabling monitoring this time so we can
see if there are any signs of what might be going on.  Did you want me
to increase the number of buckets to see if that changes anything?
This is pretty easy for me to do.

Bryan

On Thu, Sep 5, 2013 at 11:07 AM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:
> based on your numbers, you were at something like an average of 186 objects
> per bucket at the 20 hour mark?  I wonder how this trend compares to what
> you'd see with a single bucket.
>
> With that many buckets you should have indexes well spread across all of the
> OSDs.  It'd be interesting to know what the iops/throughput is on all of
> your OSDs now (blktrace/seekwatcher can help here, but they are not the
> easiest tools to setup/use).
>
> Mark
>
> On 09/05/2013 11:59 AM, Bryan Stillwell wrote:
>>
>> Mark,
>>
>> Yesterday I blew away all the objects and restarted my test using
>> multiple buckets, and things are definitely better!
>>
>> After ~20 hours I've already uploaded ~3.5 million objects, which much
>> is better then the ~1.5 million I did over ~96 hours this past
>> weekend.  Unfortunately it seems that things have slowed down a bit.
>> The average upload rate over those first 20 hours was ~48
>> objects/second, but now I'm only seeing ~20 objects/second.  This is
>> with 18,836 buckets.
>>
>> Bryan
>>
>> On Wed, Sep 4, 2013 at 12:43 PM, Bryan Stillwell
>> <bstillwell@xxxxxxxxxxxxxxx> wrote:
>>>
>>> So far I haven't seen much of a change.  It's still working through
>>> removing
>>> the bucket that reached 1.5 million objects though (my guess is that'll
>>> take
>>> a few more days), so I believe that might have something to do with it.
>>>
>>> Bryan
>>>
>>>
>>> On Wed, Sep 4, 2013 at 12:14 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx>
>>> wrote:
>>>>
>>>>
>>>> Bryan,
>>>>
>>>> Good explanation.  How's performance now that you've spread the load
>>>> over
>>>> multiple buckets?
>>>>
>>>> Mark
>>>>
>>>> On 09/04/2013 12:39 PM, Bryan Stillwell wrote:
>>>>>
>>>>>
>>>>> Bill,
>>>>>
>>>>> I've run into a similar issue with objects averaging ~100KiB.  The
>>>>> explanation I received on IRC is that there are scaling issues if
>>>>> you're
>>>>> uploading them all to the same bucket because the index isn't sharded.
>>>>>    The recommended solution is to spread the objects out to a lot of
>>>>> buckets.  However, that ran me into another issue once I hit 1000
>>>>> buckets which is a per user limit.  I switched the limit to be
>>>>> unlimited
>>>>> with this command:
>>>>>
>>>>> radosgw-admin user modify --uid=your_username --max-buckets=0
>>>>>
>>>>> Bryan
>>>>>
>>>>>
>>>>> On Wed, Sep 4, 2013 at 11:27 AM, Bill Omer <bill.omer@xxxxxxxxx
>>>>> <mailto:bill.omer@xxxxxxxxx>> wrote:
>>>>>
>>>>>      I'm testing ceph for storing a very large number of small files.
>>>>>        I'm seeing some performance issues and would like to see if
>>>>> anyone
>>>>>      could offer any insight as to what I could do to correct this.
>>>>>
>>>>>      Some numbers:
>>>>>
>>>>>      Uploaded 184111 files, with an average file size of 5KB, using
>>>>>      10 separate servers to upload the request using Python and the
>>>>>      cloudfiles module.  I stopped uploading after 53 minutes, which
>>>>>      seems to average 5.7 files per second per node.
>>>>>
>>>>>
>>>>>      My storage cluster consists of 21 OSD's across 7 servers, with
>>>>> their
>>>>>      journals written to SSD drives.  I've done a default installation,
>>>>>      using ceph-deploy with the dumpling release.
>>>>>
>>>>>      I'm using statsd to monitor the performance, and what's
>>>>> interesting
>>>>>      is when I start with an empty bucket, performance is amazing, with
>>>>>      average response times of 20-50ms.  However as time goes on, the
>>>>>      response times go in to the hundreds, and the average number of
>>>>>      uploads per second drops.
>>>>>
>>>>>      I've installed radosgw on all 7 ceph servers.  I've tested using a
>>>>>      load balancer to distribute the api calls, as well as pointing the
>>>>>      10 worker servers to a single instance.  I've not seen a real
>>>>>      different in performance with this either.
>>>>>
>>>>>
>>>>>      Each of the ceph servers are 16 core Xeon 2.53GHz with 72GB of
>>>>> ram,
>>>>>      OCZ Vertex4 SSD drives for the journals and Seagate Barracuda ES2
>>>>>      drives for storage.
>>>>>
>>>>>
>>>>>      Any help would be greatly appreciated.
>>>>>
>>>>>
>>>>>      _______________________________________________
>>>>>      ceph-users mailing list
>>>>>      ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>>>>      http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com