Re: glusterFS, many small files and data replication

Erik Osterman <e@xxxxxxxxxxxx> · Wed, 07 Mar 2007 12:03:25 -0800

We have a similar setup with millions of small XML fragments (5-15k bytes) that is expected to grow much larger. 
Are we to continue to expect O(1) performance as our dataset increases?  The documentation emphasizes how well the 
system scales for storage capacity, but not wrt inodes (for lack of a better term).

Best,

Erik Osterman

Bernhard J. M. Gruen Wrote:

Hello list members,
at the moment we are searching for a storage cluster solution that
should fulfill the following specifications:
* 3 cluster nodes, each node should have 24x SATA disks (750GB) in a RAID 6
* data replication (each file stored on at least 2 of 3 nodes)
* easy desaster recovery (in case a node crashes fully)
* high speed read access to many small files at the same time
(20.000.000-200.000.000 files of sizes 5kB to 60kB)
* files are delivered by a web server
* average writing speed
* system should even work if one(two) storage node is completely down
At the moment we think most of this is already possible with
glusterFS. Only the recovery part is not yet possible (should be done
with 1.4).
But how does glusterFS perform with that many small files? I could
imagine that glusterFS is not optimized for that usage because it does
not have a meta data server that helps the clients to find the right
server to ask for a file.
Is glusterFS the right system for us?