Thanks John, I really mean my files are too small for HDFS, as the majority of them will be under 64M, which I think is (still?) the default HDFS block size, *and also,* they will be very numerous. As such, they would quickly consume a huge aggregate amount of RAM on the HDFS name node, which is designed to store a certain amount of bytes per file. The name node in that sense it may seem, had been initially designed to "manage" for a collection of huge files, not a huge collection of small files. Or at least it may seem from documentation it's not optimized for that. A constructive approach may suggest I'd just have to allocate a large server instance for the HDFS name node, which may a first step on a path towards learning the next bottleneck using HDFS for such files, the hard / long way. Yes, I am aware HDFS has some special dedicated API for handling small files, and some community wrappers for managing with small files, but they seem a bit hackish, or feel like "too many moving parts" for a simple scenario. What do you think, and what do you think about Ceph for this scenario? Thanks in advance! Matan On Thu, Jul 31, 2014 at 7:20 PM, John Spray <john.spray at redhat.com> wrote: > On Wed, Jul 30, 2014 at 5:08 PM, Matan Safriel <dev.matan at gmail.com> > wrote: > > I'm looking for a distributed file system, for large JSON documents. My > file > > sizes are roughly between 20M and 100M, so they are too small for > couchbase, > > mongodb, even possibly Riak, but too small (by an order of magnitude) for > > HDFS. Would you recommend Ceph for this kind of scenario? > > When you say they're too small for HDFS, do you really mean they're > too numerous? How many are we talking about? > > If your use case calls for just puts and gets of named serialized > blobs, you may be best off with the RGW or librados object store > interfaces to Ceph, rather than the file system per se. > > > Additional question - will it also install and behave gracefully as a > > single-node cluster running on a single linux machine, in a dev scenario > > and/or a unit test machine scenario? > > Yes, that's how some of the ceph tests themselves operate. > > Cheers, > John > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140802/338ed40d/attachment.htm>