Radosgw Timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/23/14 03:47 , Georg H?llrigl wrote:
>
>
> On 22.05.2014 17:30, Craig Lewis wrote:
>> On 5/22/14 06:16 , Georg H?llrigl wrote:
>>>
>>> I have created one bucket that holds many small files, separated into
>>> different "directories". But whenever I try to acess the bucket, I
>>> only run into some timeout. The timeout is at around 30 - 100 seconds.
>>> This is smaller then the Apache timeout of 300 seconds.
>>>
>> Just so we're all talking about the same things, what does "many small
>> files" mean to you?  Also, how are you separating them into
>> "directories"?  Are you just giving files in the same "directory" the
>> same leading string, like "dir1_subdir1_filename"?
>
> I can only estimate how many files. ATM I've 25M files on the origin 
> but only 1/10th has been synced to radosgw. These are distributed 
> throuhg 20 folders, each containing about 2k directories with ~ 100 - 
> 500 files each.
>
> Do you think that's too much in that usecase?
>
The recommendations I've seen indicate that 25M objects per bucket is 
doable, but painful.  The bucket is itself an object stored in Ceph, 
which stores the list of objects in that bucket.   With a single bucket 
containing 25M objects, you're going to hotspot on the bucket.  Think of 
a bucket like a directory on a filesystem.  You wouldn't store 25M files 
in a single directory.

Buckets are a bit simpler than directories.  They don't have to track 
permissions, per file ACLs, and all the other things that POSIX 
filesystems do.  You can push them harder than a normal directory, but 
the same concepts still apply.  The more files you put in a 
bucket/directory, the slower it gets.  Most filesystems impose a hard 
limit on the number of files in a directory.  RadosGW doesn't have a 
limit, it just gets slower.

Even the list of buckets has this problem.  You wouldn't want to create 
25M buckets with one object each.  By default, there is a 1000 bucket 
limit per user, but you can increase that.


If you can handle using 20 buckets, it would be worthwhile to put each 
one of your top 20 folders into it's own bucket.  If you can break it 
apart even more, that would be even better.

I mentioned that I have a bunch of buckets with ~1M objects each. GET 
and PUT of objects is still fast, but listing the contents of the bucket 
takes a long time.  Each bucket takes 20-30 minutes to get a full 
listing.  If you're going to be doing a lot of bucket listing, you might 
want to keep each bucket below 1000 items.  Maybe each of your 2k 
directories gets it's own bucket.


If using more than one bucket is difficult, then 25M objects in one 
bucket will work.


-- 

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/>  | Twitter 
<http://www.twitter.com/centraldesktop>  | Facebook 
<http://www.facebook.com/CentralDesktop>  | LinkedIn 
<http://www.linkedin.com/groups?gid=147417>  | Blog 
<http://cdblog.centraldesktop.com/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140523/56781aea/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux