Le Tue, 22 Jan 2013 12:22:42 +0800 vous écriviez: > We plan to use a 6+2 RAID6 to start off with. Bad, bad idea. 6+2 RAID-6 performance sucks. If you want decent performance, use at least 12 to 16 drives arrays. See these tests (all using the same HGST HUA 2 TB drives, and Adaptec 5xx5 controllers, and XFS) : 6+2 2TB RAID-6 sequential performance : write 580 MB/s, read 660 MB/s 10+2 2TB RAID-6 sequential performance: write 800 MB/s, read 1250 MB/s 14+2 2TB RAID-6 sequential performance: write 830 MB/s, read 1300 MB/s 22+2 2TB RAID-6 sequential performance: write 900 MB/s, read 1400 MB/s As you can see, maximum controller throughput is reached around 12 to 16 drives. And the difference in IOPS is even more obvious. Better controllers will give you more oomph with wider arrays (but at higher cost, obviously). > Then when it gets > filled up to maybe 60-70% we will > expand by adding another 6+2 RAID6 to the array. > The max we can grow this configuration is up to 252TB usable which > should be enough for a year. > Our requirements might grow up to 2PB in 2 years time if all goes > well. And you'll always write to the latest RAID only. Plan for that: you need your base array to be fast enough to serve your planned traffic in 2 years time, else you'll have to ditch everything and rebuild it from the ground up. > So I have been testing all of this out on a VM running 3 vmdk's and > using LVM to create a single logical volume of the 3 disks. > I noticed that out of sdb, sdc and sdd, files keep getting written to > sdc. This is probably due to our web app creating a single folder and > all files are written under that folder. No, this is due to the fact that LVM can't stripe across physical volumes if you keep adding them, and that xfs can't optimze AGs after extension. If you start with one volume then add one, and another one, you'll always be writing to the volumes sequentially. Therefore your maximum write performance will always be that of the current volume, making it more obvious that you must look for the fastest single volume performance from the start. > > Is LVM a good choice of doing this configuration? Or do you have a > better recommendation? You could try a parallel filesystem like Lustre, PVFS2, Gluster, Ceph... These are precisely made to overcome these kind of problems and scaling by adding more nodes. Lustre and PVFS2 are HPC-oriented. Lustre is a complete PITA to be reserved to specialists. PVFS2 is relatively easy to set up, run and extend (if properly planned beforehand), and can run for years without a glitch. Gluster and Ceph are more "internet oriented" and won't give as much performance (well actually they're pretty slow) but provide redundancy and on-the-fly expansion. > > Mount options: > /dev/mapper/vg_xfs-lv_xfs on /xfs type xfs > (rw,noatime,nodiratime,logdev=/dev/vg_xfs/lv_log_xfs,nobarrier,inode64,logbsize=262144,allocsize=512m) > > Is I was to use the 8-disk RAID6 array with a 256kB stripe size will > have a sunit of 512 and a swidth of (8-2)*512=3072. > # mkfs.xfs -d sunit=512,swidth=3072 /dev/mapper/vg_xfs-lv_xfs > # mount -o remount,sunit=512,swidth=3072 > Correct? Don't bother, use su and sw: su=256k,sw=6 -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@xxxxxxxxxxxxxx> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs