On Wed, Jun 13, 2012 at 10:54:04AM +0200, Matthew Whittaker-Williams wrote: > On 6/13/12 3:19 AM, Dave Chinner wrote: > > > >With the valid stack traces, I see that it isn't related to the log, > >though. > > Ah ok, we are triggering a new issue? No, your system appears to be stalling waiting for IO completion. > >>RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 > >>Size : 40.014 TB > >>State : Optimal > >>Strip Size : 64 KB > >>Number Of Drives : 24 > >..... > >>Virtual Drive: 1 (Target Id: 1) > >>Name : > >>RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 > >>Size : 40.014 TB > >>State : Optimal > >>Strip Size : 1.0 MB > >>Number Of Drives : 24 > >OOC, any reason for the different stripe sizes on the two > >RAID volumes? > > This is a fluke, we are running several new systems and this is just > one of the new servers. > Which indeed has a wrong stripe set, this should be 1MB. > We actually found stripe size set of 1MB to give better performance > overall than 64/256/512 So if you fix that, does the problem go away? > >And that is sync waiting for the flusher thread to complete > >writeback of all the dirty inodes. The lack of other stall messages > >at this time makes it pretty clear that the problem is not > >filesystem related - the system is simply writeback IO bound. > > > >The reason, I'd suggest, is that you've chosen the wrong RAID volume > >type for your workload. Small random file read and write workloads > >like news and mail spoolers are IOPS intensive workloads and do > >not play well with RAID5/6. RAID5/6 really only work well for large > >files with sequential access patterns - you need to use RAID1/10 for > >IOPS intensive workloads because they don't suffer from the RMW > >cycle problem that RAID5/6 has for small writes. The iostat output > >will help clarify whether this is really the problem or not... > > I understand that RAID 10 is better for performance for reads on > small files sets. But with raid 10 we of course loose a lot of > disk space compared to RAID 6. Side note to this we have been > running RAID 6 for years now without any issues. but have you been running 24 disk RAID6 volumes? With RAID5/6, the number of disks of the volume really matters - for small write IOs, the more disks in the RAID6 volume, the slower it will be... > In the past we did tune our xfs filesystem with switches like > sunit and swidth. But back then we couldn't see much peformance > difference between using: > > mkfs.xfs -f -L P.01 -l lazy-count=1 -d su=1m,sw=22 /dev/sda > > and > > mkfs.xfs -f -L P.01 -l lazy-count=1 /dev/sda You won't see much difference with the BBWC enabled. It does affect how files and inodes are allocated, though, so the aging characteristics of the filesystem will be better for an aligned filesystem. i.e. you might not notice the performance now, but after a coupl eof years in production you probably will... > xfs_info from a system that shows no problems with an H800 > Controller from dell ( same chipset as the LSI controllers ) > > Product Name : PERC H800 Adapter > Serial No : 071002C > FW Package Build: 12.10.1-0001 > > sd60:~# xfs_info /dev/sda > meta-data=/dev/sda isize=256 agcount=58, > agsize=268435455 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=15381037056, imaxpct=1 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=521728, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > Where we even have bigger spools: You have larger drives, not a wider RAID volume. That's a 23-disk wide, 3TB drive RAID6 volume. And it's on a different controller with different firmware, so there's lots different here... > Aside from the wrong stripe set and write alignments, this still > should not cause the kernel to crash like this. The kernel is not crashing. It's emitting warnings that indicate the IO subsystem is overloaded. > We found that running with a newer driver of LSI it takes a bit > longer for the kernel to crash but it still does. Which indicates the problem is almost certainly related to the storage configuration or drivers, not the filesystem.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs