On 09/29/2011 01:44 PM, David Miller wrote: > On Thu, Sep 29, 2011 at 1:32 PM, David Miller <david3d at gmail.com > <mailto:david3d at gmail.com>> wrote: > > Couldn't you accomplish the same thing with flashcache? > https://github.com/facebook/flashcache/ > > > I should expand on that a little bit. Flashcache is a kernel module > created by Facebook that uses the device mapper interface in Linux to > provide a ssd cache layer to any block device. > > What I think would be interesting is using flashcache with a pcie ssd as > the caching device. That would add about $500-$600 to the cost of each > brick node but should be able to buffer the active IO from the spinning > media pretty well. Erp ... low end PCIe flash with decent performance start much higher than 500-600 $ USD. > Somthing like this. > http://www.amazon.com/OCZ-Technology-Drive-240GB-Express/dp/B0058RECUE > or something from FusionIO if you want something that's aimed more at > the enterprise. Flashcache is reasonably good, but there are many variables in using it, and its designed for a different use case. For most people the writeback may be reasonable, but other use cases would require different configs. This said, please understand that it (and L2ARC, and other similar things) are *not* silver bullets (e.g. not magical things that will instantly make something far better, at no cost/effort). They do introduce additional complexity, and additional tuning points. The thing you cannot get rid of, the network traversal, is implicated in much of the performance degradation for small files. Putting the file system on a RAM disk (if possible, tmpfs doesn't support xattrs), wouldn't make the system much faster for small files. Eliminating the network traversal and doing local distributed caching of metadata on the client side ... could ... but this would be a huge new complication, and I'd argue that it probably isn't worth it. For the short duration, small file performance is going to be bad. You might be able to play some games to make this performance better (L2ARC etc. could help in some aspects, but they won't be universally much better). What matters most is very good design on the storage backend (we are biased due to what it is we sell/support), very good networking, and very good gluster implementation/tuning. Its real easy to hit very slow performance by missing critical elements. We field many inquiries which usually start out with "we built our own and the performance isn't that good." You won't get good performance on the cluster file system if the underlying file system and storage design isn't going to give it to you in the first place. This said, please understand that there is a (significant) performance cost to all those nice features in ZFS. And there is a reason why its not generally considered a high performance file system. So if you start building with it, you shouldn't necessarily think that the whole is going to be faster than the sum of the parts. Might be worse. This is a caution from someone who has tested/shipped many different file systems in the past. ZFS included, on Solaris and other machines. There is a very significant performance penalty one pays for using some of these features. You have to decide if this penalty is worth it. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615