On 22/10/13 18:56, Stan Hoeppner wrote: > On 10/22/2013 2:24 AM, David Brown wrote: >> On 22/10/13 02:36, Steve Bergman wrote: >> >> <snip> >> >>> But hey, this is going to be a very nice opportunity for observing XFS's >>> savvy with parallel i/o. >> >> You mentioned using a 6-drive RAID10 in your first email, with XFS on >> top of that. Stan is the expert here, but my understanding is that you >> should go for three 2-drive RAID1 pairs, and then use an md linear >> "raid" for these pairs and put XFS on top of that in order to get the >> full benefits of XFS parallelism. > > XFS on a concatenation, which is what you described above, is a very > workload specific storage architecture. It is not a general use > architecture, and almost never good for database workloads. Here most > of the data is stored in a single file or a small set of files, in a > single directory. With such a DB workload and 3 concatenated mirrors, > only 1/3rd of the spindles would see the vast majority of the IO. > That's a good point - while I had noted that the OP was running a database, I forgot it was a virtual windows machine and MS SQL database. The virtual machine will use a single large file for its virtual harddisk image, and so RAID10 + XFS will beat RAID1 + concat + XFS. On the other hand, he is also serving 100+ freenx desktop users. As far as I understand it (and I'm very happy for corrections if I'm wrong), that will mean a /home directory with 100+ sub-directories for the different users - and that /is/ one of the ideal cases for concat+XFS parallelism. Only the OP can say which type of access is going to dominate and where the balance should go. As a more general point, I don't know that you can generalise that database workloads normally store data in a single big file or a small set of files. I haven't worked with many databases, and none more than a few hundred MB, so I am theorising here on things I have read rather than personal practice. But certainly with postgresql the data is split into multiple directories - each table has its own directory. For very big tables, the data is split into multiple files - and at some point, they will hit the allocation group size and then be split over multiple AG's, leading to parallelism (with a bit of luck). I am guessing other databases are somewhat similar. Of course, like any database tuning, this will all be highly load-dependent. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html