Hi Stan,
On Thu, Dec 5, 2013 at 12:10 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
On 12/4/2013 8:55 PM, Mike Dacre wrote:
...
> I have a 16 2TB drive RAID6 array powered by an LSI 9240-4i. It has an XFS.
It's a 9260-4i, not a 9240, a huge difference. I went digging through
your dmesg output because I knew the 9240 doesn't support RAID6. A few
questions. What is the LSI RAID configuration?
You are right, sorry. 9260-4i
1. Level -- confirm RAID6
Definitely RAID6
2. Strip size? (eg 512KB)
64KB
3. Stripe size? (eg 7168KB, 14*256)
Not sure how to get this
4. BBU module?
Yes. iBBU, state optimal, 97% charged.
5. Is write cache enabled?
Yes: Cahced IO and Write Back with BBU are enabled.
I have also attached an adapter summary (megaraid_adp_info.txt) and a virtual and physical drive summary (megaraid_drive_info.txt).
What is the XFS geometry?
5. xfs_info /dev/sda
`xfs_info /dev/sda1`
meta-data ="" isize=256 agcount=26, agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=6835404288, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
This is also attached as xfs_info.txt
A combination of these these being wrong could very well be part of your
problems.
...
> IO errors when any requests were made. This happened while it was beingI didn't see any IO errors in your dmesg output. None.
Good point. These happened while trying to ls. I am not sure why I can't find them in the log, they printed out to the console as 'Input/Output' errors, simply stating that the ls command failed.
> accessed by 5 different users, one was doing a very large rm operation (rmWith delaylog enabled, which I believe it is in RHEL/CentOS 6, a single
> *sh on thousands on files in a directory). Also, about 30 minutes before
> we had connected the globus connect endpoint to allow easy file transfers
> to SDSC.
big rm shouldn't kill the disks. But with the combination of other
workloads it seems you may have been seeking the disks to death.
That is possible, workloads can get really high sometimes. I am not sure how to control that without significantly impacting performance - I want a single user to be able to use 98% IO capacity sometimes... but other times I want the load to be split amongst many users. Also, each user can execute jobs simultaneously on 23 different computers, each acessing the same drive via NFS. This is a great system most of the time, but sometimes the workloads on the drive get really high.
...
> In the end, I successfully repaired the filesystem with `xfs_repair -LI'm sure your users will let you know. I'd definitely have a look in
> /dev/sda1`. However, I am nervous that some files may have been corrupted.
the directory that was targeted by the big rm operation which apparently
didn't finish when XFS shutdown.
Yes. A few things. The first is this, and it's a big one:
> Do any of you have any idea what could have caused this problem?
Dec 4 18:15:28 fruster kernel: io scheduler noop registered
Dec 4 18:15:28 fruster kernel: io scheduler anticipatory registered
Dec 4 18:15:28 fruster kernel: io scheduler deadline registered
Dec 4 18:15:28 fruster kernel: io scheduler cfq registered (default)
http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E
"As of kernel 3.2.12, the default i/o scheduler, CFQ, will defeat much
of the parallelization in XFS."
*Never* use the CFQ elevator with XFS, and never with a high performance
storage system. In fact, IMHO, never use CFQ period. It was horrible
even before 3.2.12. It is certain that CFQ is playing a big part in
your 120s timeouts, though it may not be solely responsible for your IO
bottleneck. Switch to deadline or noop immediately, deadline if LSI
write cache is disabled, noop if it is enabled. Execute this manually
now, and add it to a startup script and verify it is being set at
startup, as it's not permanent:
echo deadline > /sys/block/sda/queue/scheduler
Wow, this is huge, I can't believe I missed that. I have switched it to noop now as we use write caching. I have been trying to figure out for a while why I would keep getting timeouts when the NFS load was high. If you have any other suggestions for how I can improve performance, I would greatly appreciate it.
This one simple command line may help pretty dramatically, immediately,
assuming your hardware array parameters aren't horribly wrong for your
workloads, and your XFS alignment correctly matches the hardware geometry.
Great, thanks. Our workloads vary considerably as we are a biology research lab, sometimes we do lots of seeks, other times we are almost maxing out read or write speed with massively parallel processes all accessing the disk at the same time.
--
Stan
-Mike
Adapter #0 ============================================================================== Versions ================ Product Name : LSI MegaRAID SAS 9260-4i Serial No : SV14821972 FW Package Build: 12.14.0-0167 Mfg. Data ================ Mfg. Date : 11/24/11 Rework Date : 00/00/00 Revision No : 61A Battery FRU : N/A Image Versions in Flash: ================ FW Version : 2.130.393-2551 BIOS Version : 3.28.00_4.14.05.00_0x05270000 Preboot CLI Version: 04.04-020:#%00009 WebBIOS Version : 6.0-52-e_48-Rel NVDATA Version : 2.09.03-0045 Boot Block Version : 2.02.00.00-0000 BOOT Version : 09.250.01.219 Pending Images in Flash ================ None PCI Info ================ Controller Id : 0000 Vendor Id : 1000 Device Id : 0079 SubVendorId : 1000 SubDeviceId : 9260 Host Interface : PCIE ChipRevision : B4 Link Speed : 0 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 4 Port : Address 0 500304800129497f 1 0000000000000000 2 0000000000000000 3 0000000000000000 HW Configuration ================ SAS Address : 500605b004137820 BBU : Present Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 512MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Absent Temperature sensor for controller : Absent Settings ================ Current Time : 7:21:54 12/5, 2013 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 4 Delay Among Spinup Groups : 2s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled Battery Warning : Enabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : No Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 120 Auto Enhanced Import : Yes Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No POST delay : 90 seconds BIOS Error Handling : Stop On Errors Current Boot Mode :Normal Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Mix of SAS/SATA of SSD type in VD Allowed Mix of SSD/HDD in VD Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 80 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 346 MB Device Present ================ Virtual Drives : 1 Degraded : 0 Offline : 0 Physical Devices : 18 Disks : 16 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : Yes Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : Yes Support PFK : Yes Support PI : No Support Boot Time PFK Change : No Disable Online PFK Change : No PFK TrailTime Remaining : 0 days 0 hours Support Shield State : No Block SSD Write Disk Cache Change: No Support Online FW Update : Yes Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : Yes Support Breakmirror : No Power Savings : No Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : Yes Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : Yes Disable Spin Down of hot spares : No Spin Down time : 30 T10 Power State : No Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 256kB Flush Time : 4 seconds Write Policy : WB Read Policy : RA Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : Yes Coercion Mode : None ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : 3 Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : Yes Allow SSD SAS/SATA Mix in VD : Yes Allow HDD/SSD Mix in VD : Yes Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : No Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : Yes Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : No Enable Led Header : Yes Delay during POST : 0 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : Yes Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : Don't Auto spin down Configured Drives Max power savings option is not allowed for LDs. Only T10 power conditions are to be used. Default spin down time in minutes: 30 Enable JBOD : No TTY Log In Flash : No Auto Enhanced Import : Yes BreakMirror RAID Support : Yes Disable Join Mirror : No Enable Shield State : No Time taken to detect CME : 60s Exit Code: 0x00
System Operating System: Linux version 2.6.32-358.23.2.el6.x86_64 Driver Version: 06.504.01.00-rh1 CLI Version: 8.07.07 Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 25.463 TB Sector Size : 512 Is VD emulated : No Parity Size : 3.637 TB State : Optimal Strip Size : 64 KB Number Of Drives : 16 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Enabled Encryption Type : None Bad Blocks Exist: No Is VD Cached: No Exit Code: 0x00 Hardware Controller ProductName : LSI MegaRAID SAS 9260-4i(Bus 0, Dev 0) SAS Address : 500605b004137820 FW Package Version: 12.14.0-0167 Status : Optimal BBU BBU Type : iBBU Status : Healthy Enclosure Product Id : SAS2X28 Type : SES Status : OK Product Id : SGPIO Type : SGPIO Status : OK PD Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 0 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 1 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 2 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 3 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 5 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 6 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 7 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 4 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 11 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 10 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 9 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 8 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 15 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 14 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 13 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 1 >: Slot 12 Vendor Id : ATA Product Id : WDC WD2002FAEX-0 State : Online Disk Type : SATA,Hard Disk Device Capacity : 1.818 TB Power State : Active Storage Virtual Drives Virtual drive : Target Id 0 ,VD name Size : 25.463 TB State : Optimal RAID Level : 6 Exit Code: 0x00
meta-data=/dev/sda1 isize=256 agcount=26, agsize=268435455 blks = sectsz=512 attr=2 data = bsize=4096 blocks=6835404288, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs