> > I've posted to the list about this issue before actually. > We had/have a similar requirement for storing a very large number of > fairly small files, and originally had them all in just a few directories > in glusterfs. > Directory layout also matters here "number of files v/s number of directory" hierarchy, also necessary to know is how does the application reach to these individual files (access patterns) > It turns out that Glusterfs is really badly suited to directories with > large numbers of files in them. If you can split them up, do so, and > performance will become tolerable again. > > But even then it wasn't great.. Self-heal can swamp the network, making > access for clients so slow as to cause problems. > > This analysis is wrong - self-heal daemon runs in lower priority threads and shouldn't be swamping the network at all. It never competes by default against User i/o traffic. Which was the version this was tested against? > For your use case (wanting distributed, replicated storage for large > numbers of 1mb files) I suggest you check out Riak and the Riak CS add-on. > It's proven to be great for that particular use-case for us. > Including all of that there is a fair amount of tuning which should be done at kernel, network and filesystem level as well. NoSQL's such as Riak could be beneficial but again are based on use-case basis. -- *Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131011/598834c0/attachment.html>