On Mon, Oct 18, 2010 at 09:42:04AM -0400, Angelo McComis wrote: > All: > > Apologies but I am new to this list, and somewhat new to XFS. > > I have a use case where I'd like to forward the use of XFS. This is for > large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as > what you'd see under a database's data file / tablespace. Yup, perfect use case for XFS. > My database vendor (who, coincidentally markets their own filesystems and > operating systems) says that there are certain problems under XFS with > specific mention of corruption issues, if a single root or the metadata > become corrupted, the entire filesystem is gone, Yes, they are right about detected metadata corruption causing a filesystem _shutdown_, but that does not mean that a metadata corruption event will cause your entire filesystem to disappear. Besides, the worst case for _any_ filesystem is that it gets corrupted beyond repair and you have to restore from backups, so you still have to plan for this eventuality when dealing with disaster recovery scenarios. What they neglect to mention is that XFS has a lot of metadata corruption detection code, and shutÑ down at the first detection to prevent the filesystem for being further damaged before a repair process can be run. Apart from btrfs, XFS has the best run-time metadata corruption detection of any filesystem in Linux, and even so there are plans to improve that over the next year of so.... > and it has performance > issues on a multi-threaded workload, caused by the single root filesystem > for metadata becoming a bottleneck. Single root design has nothing to do with performance on multithreaded workloads. However, XFS really isn't a single-root design. While it has a single root for the _directory structure_, the allocation subsystem has a root per allocation group and hence allocation operations can occur in parallel in XFS. Hence the only points of serialisation for most operations is either an individual directory being operated on or the journalling subsystem. Simultaneous directory modifications are not something that databases (or any application) do very often, so that point of serialisation is not something you're ever likely to hit. Besides, this serialisation is a limitation of the linux VFS, not something specific to XFS. Similarly, databases don't do a lot of metadata operations so the journalling subsytem won't be a bottleneck, either. Databases do large amounts of _data IO_ to and from files, and that is what XFS excels at. Especially if the database is using direct IO, because then XFS allows concurrent read and write access to the file so the only limitations in throughput is the storage subsystem and the database itself... And FWIW, I've done nothing but improve multithreaded throughput for metadata operations in XFS for the past few months, so the claims your vendor is making really have no basis in reality. > This feedback from the vendor is surely taken with a grain of salt as they > have marketing motivations of their own product to consider. > > Surely, something like corruption and bottlenecks under heavy load / > multi-threaded use would be a bug that would be addressed, right? Yes, absolutely. Please ask the vendor to raise bugs for any issues they have seen next time they say this to you. > And surely, something like a BTree structure, with a root node, journaled > metadata, etc. would be inherent in other filesystem choices as well, right? Yes. > The vendor, in the end, did recommend ext4, but ext4 is not in my mainline > Linux kernel as anything beyond "tech preview" at this point. Oh, man, I almost spat out my coffee all over my keyboard when I read that. I needed a good laugh this morning. :) So what we have here is a classic case of FUD. Your vendor's recommendation to use ext4 instead of XFS directly contradicts their message not to use XFS. ext4 is exactly the same as XFS in regard to the single root/metadata corruption design issues, but ext4 does a much worse job of detecting corruption at runtime compared to XFS. ext4 is also immature, is pretty much untested in long-term production environments and has developers that are already struggling to understand and maintain the code because of the way it has been implemented. IOWs, your vendor is recommending a filesystem that is _inferior to XFS_. That's a classic sales technique - level FUD at a competitor, then recommend an inferior solution as the _better alternative_. The key to this technique is that the alternative needs to be something that the customer will recognise as not being viable for deployment in business critical systems. So now the customer doesn't want to use either, and they are ready for the "but we've got this really robust solution and it only costs $$$" sucker-punch. My best guess at the reason for such a carefully targeted sales technique is that their database is just as robust and performs just as well on XFS as it does on their own solutions that cost mega-$$$. What other motivation is there for taking such an approach? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs