On 7/19/2011 3:37 AM, Emmanuel Florac wrote: > Le Mon, 18 Jul 2011 14:58:55 -0500 vous écriviez: > >> card: MegaRAID SAS 9260-16i >> disks: 14x Barracuda® XT ST33000651AS 3TB (2 hot spares). >> RAID6 >> ~ 30TB > This card doesn't activate the write cache without a BBU present. Be > sure you have a BBU or the performance will always be unbearably awful. In addition to all the other recommendations, once the BBU is installed, disable the individual drive caches (if this isn't done automatically), and set the controller cache mode to 'write back'. The write through and direct I/O cache modes will deliver horrible RAID6 write performance. And, BTW, RAID6 is a horrible choice for a parallel, small file, high random I/O workload such as you've described. RAID10 would be much more suitable. Actually, any striped RAID is less than optimal for such a small file workload. The default stripe size for the LSI RAID controllers, IIRC, is 64KB. With 14 spindles of stripe width you end up with 64*14 = 896KB. XFS will try to pack as many of these 50-150K files into a single extent, but you're talking 6 to 18 files per extent, and this is wholly dependent on the parallel write pattern, and in which of the allocation groups XFS decides to write each file. XFS isn't going to be 100% efficient in this case. Thus, you will end up with many partial stripe width writes, eliminating much of the performance advantage of striping. These are large 7200 rpm SATA drives which have poor seek performance to begin with, unlike the 'small' 300GB 15k SAS drives. You're robbing that poor seek performance further by: 1. Using double parity striped RAID 2. Writing thousands of small files in parallel This workload is very similar to the case of a mail server using the maildir storage format. If you read the list archives you'll see recommendations for an optimal storage stack setup for this workload. It goes something like this: 1. Create a linear array of hardware RAID1 mirror sets. Do this all in the controller if it can do it. If not, use Linux RAID (mdadm) to create a '--linear' array of the multiple (7 in your case, apparently) hardware RAID1 mirror sets 2. Now let XFS handle the write parallelism. Format the resulting 7 spindle Linux RAID device with, for example: mkfs.xfs -d agcount=14 /dev/md0 By using this configuration you eliminate the excessive head seeking associated with the partial stripe write problems of RAID6, restoring performance efficiency to the array. Using 14 allocation groups allows XFS to write write, at minimum, 14 such files in parallel. This may not seem like a lot given you have ~200 writers, but it's actually far more than what you're getting now, or what you'll get with striped parity RAID. Consider the 150KB file case: 14*150KB = 2.1MB/s. Assuming this hardware and software stack can sink 210MB/s with this workload, that's ~1400 files written per second, or 84,000 files per hour. Would this be sufficient for your application? Now that we've covered the XFS and hardware RAID side of this equation, does your application run directly on the this machine, or are you writing over NFS or CIFS to this XFS filesystem? If so, that's another fly in the ointment we may have to deal with. -- Stan _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs