[Bug 206253] New: mpt3sas driver crash under heavy load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=206253

            Bug ID: 206253
           Summary: mpt3sas driver crash under heavy load
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 3.10.0-957.27.2.el7.x86_64
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: SCSI
          Assignee: linux-scsi@xxxxxxxxxxxxxxx
          Reporter: itoufiqu@xxxxxxx
        Regression: No

Created attachment 286877
  --> https://bugzilla.kernel.org/attachment.cgi?id=286877&action=edit
console screenshot

Hi guys, 

I am new to this site, so if this problem is a repeat, my apology in advance.

I have a Dell R7425 server with dual AMD EPYC-7301 CPUs, 256GB of RAM, with
dual SAS 3008 cards. OS is centOS 7.6.  

I have 60-bay JBOD connected to these 2 HBA cards, all bays are full with HGST
SAS 10TB drives.  We run BeeGFS (with ZFS at the backend) in this system.  This
is a brand new setup and we noticed that under heavy load after a while , the
system completely freezes up.  It then needs a hard reboot.  I ran starce-ng
for 2 days on the CPUs and RAM, the system was stable.  I torture tested the
boot drives as well for 2 days, nothing came out.  Everything seems normal. 
the problem seems come in when we start stressing the drives in the JBOD,
connected with the HBA card. I have the system configured with dual connected
multipath ( roound-robin ) between both of the HBA cards and primary and
secondary SAS expanders of the of JBOD.

Since this is a Dell server, I went to Dell's support website, and found a
newer mpt3sas driver and installed it.  OS installed driver version was
16.100.01.00 , and updated version now is 27.00.01.00 .

This morning, the system hung again, and I was able to capture something from
the console.  I have attached the screenshot.  

The error was, mpt3sas_cm0 fault_state(0x5854!)

Below is the current modinfo output ( in brief ) from the current mpt3sas
driver:

filename:      
/lib/modules/3.10.0-957.27.2.el7.x86_64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko.xz
alias:          mpt2sas
version:        27.00.01.00
license:        GPL
description:    LSI MPT Fusion SAS 3.0 & SAS 3.5 Device Driver
author:         Broadcom Inc. <MPT-FusionLinux.pdl@xxxxxxxxxxxx>
retpoline:      Y
rhelversion:    7.6
srcversion:     26E62E1FFC69FC8709F8CD7



I have no idea what to do here.  What can I do to fix this issue? Do I need a
special configuration?  A new driver? Our file servers are usually under
moderate to high load.  We have a 1.6PB system here ( BeeGFS + ZFS at the
backend ), with CentOS 6.9.  that system run pretty solid without much hiccup. 
that system also has a HBA 3008 card.

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux