First of all, thank you to the people who took the time to help illuminate this issue. To summarize... for unknown reasons, the 4 port SATA controller on the Dell PET-310 has an aggregate limitation of ~1.75 Gbit/s on the A&B and C&D port pairs. Each port can provide more than that to a single drive, but when trying to read or write both ports simultaneously, each port in the pair gets ~0.87Gbit/s. (Which is probably some higher nominal value minus some overhead.) The testing of (1) my workload, and (2) sequential read write, under various RAID levels, filesystems, and chunk sizes got tedious, so I decided to just automate the whole thing and let it run overnight. My initial guess was that RAID5 might have some advantages in this situation for sequential writes in that parity is less bandwidth intensive for writes than is mirroring, and I almost always have plenty of spare cpu cycles available. This turned out to be correct for ext4. (xfs still liked RAID10.) The best numbers for sequential read/write came from ext4 under 4-drive RAID5 at the default chunk size of 512k. xfs did it's best under RAID10 with chunk sizes of either 32k or 64k (which came out about the same), but was not able to match the ext4 write performance, or even come close to the read performance. The more important testing was of my actual target work load, which does a huge number of random writes building up a pair of files which are each ~2GB. My suspicion was that RAID10 would yield the better performance here, since this is not a bandwidth-bound workload. This turned out to be correct for both ext4 and xfs. Here, the best performance again came from ext4 at the default chunk size of 512k. where the operation completed (including sync) in 11m24s, with xfs doing best at a 32k chunk size, and completing in 13m07s. With that established, I decided to focus on ext4 at 512k. For the system volumes, delayed allocation is acceptable. However, for the data partition, leaving delayed allocation turned on would be irresponsible. (We have point of sale data being collected throughout the day which could not be recovered from backup. The testing shows that for this workload, mounting "nodelalloc" entails only a 7% penalty in performance, which is quite acceptable (and still faster than XFS). So that pretty much nails down my configuration. RAID10 with 512k chunks. ext4 mounted nodelalloc for the data volume. And ext4 mounted at the defaults for everything else. Now, that said... and though I don't really intend to engage in a long thread over this... the subject of XFS's suitability for this kind of work has come up, and I'll address the key points, since I do believe in calling a spade a spade. Even if xfs had come out ahead on performance, I would not have considered it for my data partition. It's been said here that the major data loss bugs in xfs have been fixed. And that's probably true. At least one would hope that after 13 years, the major data loss bugs would have been fixed. But xfs's data integrity problems are not due to bugs, but due to fundamental design decisions which cannot be changed at this point. And there is plenty of recent evidence supporting the fact that xfs still has the same data integrity problems it has always had. For example, this recent report involving a very recent enterprise Linux version: http://toruonu.blogspot.com/2012/12/xfs-vs-ext4.html Simply Googling "xfs zero" and sorting by date yields pages and pages of recent report hits. The fundamental design philosophy issues for xfs are the assumptions that: 1. Metadata is more important than data. (A brain-dead concept, to start with.) 2. Data loss is acceptable as long as the metada is kept consistent. 3. Performance is only slightly less important than metadata, and far more important than data. More specifically, the data integrity design problems for xfs are (primarily): 1. It only journals metadata, and doesn't order data writes to ensure that the data is always consistent with some valid state (even if it isn't the latest state). 2. It uses delayed allocation, which is inherently unsafe, of it you order writes ahead of the metadata. And you can't turn it off. (Please correct me if I'm wrong about that. I'd like to know.) #1 is a brick wall. There's not much that can be done. Regarding #2, I think the xfs guys did model something on Ted T'so's ext4 patches to 2.6.30 which force fsyncs for certain common idioms. (Though I think I heard that they did not adopt all of them. Not sure.) I do not consider even that full patch set to be more than a band-aid. But trusting important data to a store which employs either of the above designs is just irresponsible, and in general, responsible admins should never even consider it. Regarding xfs performance, Dave Chinner made an interesting presentation (at LinuxConf AU 2012, IIRC) in which he demonstrated the metadata scalability work that the xfs team had done, which had made it into RHEL 6.x. (It's on YT, if you missed it.) His slides did show dramatic improvements. However, they also consistently showed ext4 blowing away xfs performance on fs_mark for everything test, up until 8 threads (which covers an awful lot of common workloads). So xfs metadata performance isn't there yet, unless your workload involves 8 metadata intensive threads. To its credit, xfs did scale more or less linearly, whereas ext4 (in whatever configuration her was using; he didn't say.) started flagging somewhere between 5 & 8 threads. There's no such thing as a "best filesystem". Horses for courses. Above 16TB, xfs may (or may not) rule. Below that is (in general) ext4 territory, And we'll see how things work out for the featureful btrfs. It's too early to guess, and my crystal ball is in the shop. It's been suggested that I'm not familiar with the issues surrounding ext3's ordered mode. In fact, I'm more familiar with the history than anyone I've recently encountered. Back in '98 or '99, we didn't have any journaling fs in Linux, and I was carefully following each and every (relatively rare) post that Stephen Tweedie was making to lkml and the linux-ext2 (IIRC) list. So I know the history. I know Tweedie's thought process at the time. (Had an email exchange with him about it once.) And so I recognize that T'so (and others?) have managed an impressive rewriting of the history in a campaign to make dangerous practices palatable to a modern audience. Ext3's aggressive data-sync'ing behavior is no accident or side-effect. It was quite deliberate and intentional. And ordered mode was not all about security, but primarily about providing a sane level of data integrity, with the security features being included for free. Tweedie is a very meticulous and careful designer who understood (and understands) that: 1. Data is more important than metadata. 2. Metadata is only important because it's required in order to work with the data. 3. It's OK to provide data-endangering options to the system administrator. But they should be turned *off* by default. I get the impression that few people are aware of these aspects of ext3's history and design. Probably fewer are aware that Tweedie implemented the data=journal mode *before* he implemented the ordered and writeback modes. I can certainly see where ext3 design decisions would be a thorn in the side of designers of less safe filesystems, as it does result in programs which quickly show up their design misfeatures. While getting things closer to right than xfs, ext4 falls short of getting things really right by turning the dangerous delayed allocation behavior on by default. It should have been left as a performance optimization available to admins with workloads which allowed for it. Anyway, that's enough for me on this topic. Feel free to discuss among yourselves. But the back and forth on this could go on for weeks (if not more) and I don't care to allocate the time (delayed or not ;-) Again, thank you for the discussion and info on the T310 and general SATA issue. Sincerely, Steve Bergman (signing off) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html