On 12/5/2013 9:58 AM, Mike Dacre wrote: > On Thu, Dec 5, 2013 at 12:10 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>wrote: >> On 12/4/2013 8:55 PM, Mike Dacre wrote: >> ... > > Definitely RAID6 > > 2. Strip size? (eg 512KB) >> > 64KB Ok, so 64*14 = 896KB stripe. This seems pretty sane for a 14 spindle parity array and mixed workloads. > 4. BBU module? >> > Yes. iBBU, state optimal, 97% charged. > > 5. Is write cache enabled? >> >> Yes: Cahced IO and Write Back with BBU are enabled. I should have pointed you this this earlier: http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F but we've got most of it already. We don't have your fstab mount options. Please provide that. ... > This is also attached as xfs_info.txt You're not aligning XFS to the RAID geometry (unless you're overriding in fstab). No alignment is good though for small (<896KB) file allocations but less than optimal for large streaming allocation writes. But it isn't a factor in the problems you reported. ... >> Good point. These happened while trying to ls. I am not sure why I can't > find them in the log, they printed out to the console as 'Input/Output' > errors, simply stating that the ls command failed. We look for SCSI IO errors preceding an XFS error as a causal indicator. I didn't see that here. You could have run into the bug Ben described earlier. I can't really speak to the console errors. >> With delaylog enabled, which I believe it is in RHEL/CentOS 6, a single >> big rm shouldn't kill the disks. But with the combination of other >> workloads it seems you may have been seeking the disks to death. >> > That is possible, workloads can get really high sometimes. I am not sure > how to control that without significantly impacting performance - I want a > single user to be able to use 98% IO capacity sometimes... but other times > I want the load to be split amongst many users. You can't control the seeking at the disks. You can only schedule workloads together that don't compete for seeks. And if you have one metadata or random read/write heavy workload, with this SATA RAID6 array, it will need exclusive access for the duration of execution, or the portion that does all the random IO. Otherwise other workloads running concurrently will crawl while competing for seek bandwidth. > Also, each user can > execute jobs simultaneously on 23 different computers, each acessing the > same drive via NFS. This is a great system most of the time, but sometimes > the workloads on the drive get really high. So it's a small compute cluster using NFS over Infiniband for shared file access to a low performance RAID6 array. The IO resource sharing is automatic. But AFAIK there's no easy way to enforce IO quotas on users or processes, if at all. You may simply not have sufficient IO to go around. Let's ponder that. Looking at the math, you currently have approximately 14*150=2100 seeks/sec capability with 14x 7.2k RPM data spindles. That's less than 100 seeks/sec per compute node, i.e. each node is getting about 2/3rd of the performance of a single SATA disk from this array. This simply isn't sufficient for servicing a 23 node cluster, unless all workloads are compute bound, and none IO/seek bound. Given the overload/crash that brought you to our attention, I'd say some of your workloads are obviously IO/seek bound. I'd say you probably need more/faster disks. Or you need to identify which jobs are IO/seek heavy and schedule them so they're not running concurrently. ... >> http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E >> >> "As of kernel 3.2.12, the default i/o scheduler, CFQ, will defeat much >> of the parallelization in XFS." ... >> echo deadline > /sys/block/sda/queue/scheduler >> > Wow, this is huge, I can't believe I missed that. I have switched it to > noop now as we use write caching. I have been trying to figure out for a > while why I would keep getting timeouts when the NFS load was high. If you > have any other suggestions for how I can improve performance, I would > greatly appreciate it. This may not fix NFS timeouts entirely but it should help. If the NFS operations are seeking the disks to death you may still see timeouts. >> This one simple command line may help pretty dramatically, immediately, >> assuming your hardware array parameters aren't horribly wrong for your >> workloads, and your XFS alignment correctly matches the hardware geometry. >> > Great, thanks. Our workloads vary considerably as we are a biology > research lab, sometimes we do lots of seeks, other times we are almost > maxing out read or write speed with massively parallel processes all > accessing the disk at the same time. Do you use munin or something similar? Sample output: http://demo.munin-monitoring.org/munin-monitoring.org/demo.munin-monitoring.org/index.html#disk Project page: http://munin-monitoring.org/ It also has an NFS module and many others. The storage oriented metrics may be very helpful to you. You would install munin-node on the NFS server and all compute nodes, and munin on a collector/web server. This will allow you to cross reference client and server NFS loads. You can then cross reference the time in your PBS logs to see which users were running which jobs when IO spikes occur on the NFS server. You'll know exactly which workloads, or combination thereof, are causing IO spikes. -- Stan _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs