On 5/23/14 03:47 , Georg H?llrigl wrote: > > > On 22.05.2014 17:30, Craig Lewis wrote: >> On 5/22/14 06:16 , Georg H?llrigl wrote: >>> >>> I have created one bucket that holds many small files, separated into >>> different "directories". But whenever I try to acess the bucket, I >>> only run into some timeout. The timeout is at around 30 - 100 seconds. >>> This is smaller then the Apache timeout of 300 seconds. >>> >> Just so we're all talking about the same things, what does "many small >> files" mean to you? Also, how are you separating them into >> "directories"? Are you just giving files in the same "directory" the >> same leading string, like "dir1_subdir1_filename"? > > I can only estimate how many files. ATM I've 25M files on the origin > but only 1/10th has been synced to radosgw. These are distributed > throuhg 20 folders, each containing about 2k directories with ~ 100 - > 500 files each. > > Do you think that's too much in that usecase? > The recommendations I've seen indicate that 25M objects per bucket is doable, but painful. The bucket is itself an object stored in Ceph, which stores the list of objects in that bucket. With a single bucket containing 25M objects, you're going to hotspot on the bucket. Think of a bucket like a directory on a filesystem. You wouldn't store 25M files in a single directory. Buckets are a bit simpler than directories. They don't have to track permissions, per file ACLs, and all the other things that POSIX filesystems do. You can push them harder than a normal directory, but the same concepts still apply. The more files you put in a bucket/directory, the slower it gets. Most filesystems impose a hard limit on the number of files in a directory. RadosGW doesn't have a limit, it just gets slower. Even the list of buckets has this problem. You wouldn't want to create 25M buckets with one object each. By default, there is a 1000 bucket limit per user, but you can increase that. If you can handle using 20 buckets, it would be worthwhile to put each one of your top 20 folders into it's own bucket. If you can break it apart even more, that would be even better. I mentioned that I have a bunch of buckets with ~1M objects each. GET and PUT of objects is still fast, but listing the contents of the bucket takes a long time. Each bucket takes 20-30 minutes to get a full listing. If you're going to be doing a lot of bucket listing, you might want to keep each bucket below 1000 items. Maybe each of your 2k directories gets it's own bucket. If using more than one bucket is difficult, then 25M objects in one bucket will work. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com> *Central Desktop. Work together in ways you never thought possible.* Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140523/56781aea/attachment.htm>