Re: Performance issues with small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wouldn't using only the first two characters in the file name result
in less then 65k buckets being used?

For example if the file names contained 0-9 and a-f, that would only
be 256 buckets (16*16).  Or if they contained 0-9, a-z, and A-Z, that
would only be 3,844 buckets (62 * 62).

Bryan


On Thu, Sep 5, 2013 at 8:19 AM, Bill Omer <bill.omer@xxxxxxxxx> wrote:
>
> Thats correct.  We created 65k buckets, using two hex characters as the naming convention, then stored the files in each container based on their first two characters in the file name.  The end result was 20-50 files per bucket.  Once all of the buckets were created and files were being loaded, we still observed an increase in latency overtime.
>
> Is there a way to disable indexing?  Or are there other settings you can suggest to attempt to speed this process up?
>
>
> On Wed, Sep 4, 2013 at 5:21 PM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:
>>
>> Just for clarification, distributing objects over lots of buckets isn't helping improve small object performance?
>>
>> The degradation over time is similar to something I've seen in the past, with higher numbers of seeks on the underlying OSD device over time.  Is it always (temporarily) resolved writing to a new empty bucket?
>>
>> Mark
>>
>>
>> On 09/04/2013 02:45 PM, Bill Omer wrote:
>>>
>>> We've actually done the same thing, creating 65k buckets and storing
>>> 20-50 objects in each.  No change really, not noticeable anyway
>>>
>>>
>>> On Wed, Sep 4, 2013 at 2:43 PM, Bryan Stillwell
>>> <bstillwell@xxxxxxxxxxxxxxx <mailto:bstillwell@xxxxxxxxxxxxxxx>> wrote:
>>>
>>>     So far I haven't seen much of a change.  It's still working through
>>>     removing the bucket that reached 1.5 million objects though (my
>>>     guess is that'll take a few more days), so I believe that might have
>>>     something to do with it.
>>>
>>>     Bryan
>>>
>>>
>>>     On Wed, Sep 4, 2013 at 12:14 PM, Mark Nelson
>>>     <mark.nelson@xxxxxxxxxxx <mailto:mark.nelson@xxxxxxxxxxx>> wrote:
>>>
>>>         Bryan,
>>>
>>>         Good explanation.  How's performance now that you've spread the
>>>         load over multiple buckets?
>>>
>>>         Mark
>>>
>>>         On 09/04/2013 12:39 PM, Bryan Stillwell wrote:
>>>
>>>             Bill,
>>>
>>>             I've run into a similar issue with objects averaging
>>>             ~100KiB.  The
>>>             explanation I received on IRC is that there are scaling
>>>             issues if you're
>>>             uploading them all to the same bucket because the index
>>>             isn't sharded.
>>>                The recommended solution is to spread the objects out to
>>>             a lot of
>>>             buckets.  However, that ran me into another issue once I hit
>>>             1000
>>>             buckets which is a per user limit.  I switched the limit to
>>>             be unlimited
>>>             with this command:
>>>
>>>             radosgw-admin user modify --uid=your_username --max-buckets=0
>>>
>>>             Bryan
>>>
>>>
>>>             On Wed, Sep 4, 2013 at 11:27 AM, Bill Omer
>>>             <bill.omer@xxxxxxxxx <mailto:bill.omer@xxxxxxxxx>
>>>             <mailto:bill.omer@xxxxxxxxx <mailto:bill.omer@xxxxxxxxx>>>
>>>
>>>             wrote:
>>>
>>>                  I'm testing ceph for storing a very large number of
>>>             small files.
>>>                    I'm seeing some performance issues and would like to
>>>             see if anyone
>>>                  could offer any insight as to what I could do to
>>>             correct this.
>>>
>>>                  Some numbers:
>>>
>>>                  Uploaded 184111 files, with an average file size of
>>>             5KB, using
>>>                  10 separate servers to upload the request using Python
>>>             and the
>>>                  cloudfiles module.  I stopped uploading after 53
>>>             minutes, which
>>>                  seems to average 5.7 files per second per node.
>>>
>>>
>>>                  My storage cluster consists of 21 OSD's across 7
>>>             servers, with their
>>>                  journals written to SSD drives.  I've done a default
>>>             installation,
>>>                  using ceph-deploy with the dumpling release.
>>>
>>>                  I'm using statsd to monitor the performance, and what's
>>>             interesting
>>>                  is when I start with an empty bucket, performance is
>>>             amazing, with
>>>                  average response times of 20-50ms.  However as time
>>>             goes on, the
>>>                  response times go in to the hundreds, and the average
>>>             number of
>>>                  uploads per second drops.
>>>
>>>                  I've installed radosgw on all 7 ceph servers.  I've
>>>             tested using a
>>>                  load balancer to distribute the api calls, as well as
>>>             pointing the
>>>                  10 worker servers to a single instance.  I've not seen
>>>             a real
>>>                  different in performance with this either.
>>>
>>>
>>>                  Each of the ceph servers are 16 core Xeon 2.53GHz with
>>>             72GB of ram,
>>>                  OCZ Vertex4 SSD drives for the journals and Seagate
>>>             Barracuda ES2
>>>                  drives for storage.
>>>
>>>
>>>                  Any help would be greatly appreciated.
>>>
>>>
>>>                  _________________________________________________
>>>
>>>                  ceph-users mailing list
>>>             ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>>             <mailto:ceph-users@xxxxxxxxxx.__com
>>>             <mailto:ceph-users@xxxxxxxxxxxxxx>>
>>>             http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>>
>>>             <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux