(not subscribed to lists, please keep me on CC)
When upgrading from 4.17.2 to 4.19.2, my 9305-16i started faulting on
load (well, not so much load given its all spinning disks, but once
every hour or five during a btrfs rebalance). Mostly the disks would all
come back online and it would just result in 30 seconds of disks
offline, but occasionally it would fall offline completely. dmesg is
below, though its not so useful. Downgrading to 4.17.2 again fixed the
issue completely.
[19983.155887] mpt3sas_cm0: fault_state(0x5853)!
[19983.155932] mpt3sas_cm0: sending diag reset !!
[19984.093087] mpt3sas_cm0: diag reset: SUCCESS
[19984.155056] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default
host page size to 4k
[19984.301067] mpt3sas_cm0: _base_display_fwpkg_version: complete
[19984.301400] mpt3sas_cm0: LSISAS3224: FWVersion(09.00.100.00),
ChipRevision(0x01), BiosVersion(00.00.00.00)
[19984.301403] mpt3sas_cm0: Protocol=(
[19984.301404] Initiator
[19984.301405] ,Target
[19984.301406] ),
[19984.301407] Capabilities=(
[19984.301408] TLR
[19984.301410] ,EEDP
[19984.301411] ,Snapshot Buffer
[19984.301412] ,Diag Trace Buffer
[19984.301413] ,Task Set Full
[19984.301414] ,NCQ
[19984.301415] )
[19984.301473] mpt3sas_cm0: sending port enable !!
[19995.149962] mpt3sas_cm0: port enable: SUCCESS
[19995.150077] mpt3sas_cm0: search for end-devices: start
[19995.151143] scsi target0:0:0: handle(0x0019),
sas_addr(0x4433221100000000)
[19995.151147] scsi target0:0:0: enclosure logical
id(0x500062b203842300), slot(3)
[19995.151196] scsi target0:0:1: handle(0x001a),
sas_addr(0x4433221103000000)
[19995.151199] scsi target0:0:1: enclosure logical
id(0x500062b203842300), slot(1)
[19995.151245] scsi target0:0:3: handle(0x001b),
sas_addr(0x4433221105000000)
[19995.151247] scsi target0:0:3: enclosure logical
id(0x500062b203842300), slot(6)
[19995.151293] scsi target0:0:2: handle(0x001c),
sas_addr(0x4433221104000000)
[19995.151296] scsi target0:0:2: enclosure logical
id(0x500062b203842300), slot(7)
[19995.151342] scsi target0:0:4: handle(0x001d),
sas_addr(0x4433221106000000)
[19995.151345] scsi target0:0:4: enclosure logical
id(0x500062b203842300), slot(4)
[19995.151391] scsi target0:0:6: handle(0x001e),
sas_addr(0x4433221110000000)
[19995.151393] scsi target0:0:6: enclosure logical
id(0x500062b203842300), slot(11)
[19995.151439] scsi target0:0:7: handle(0x001f),
sas_addr(0x4433221111000000)
[19995.151441] scsi target0:0:7: enclosure logical
id(0x500062b203842300), slot(10)
[19995.151487] scsi target0:0:9: handle(0x0020),
sas_addr(0x4433221113000000)
[19995.151490] scsi target0:0:9: enclosure logical
id(0x500062b203842300), slot(9)
[19995.151535] scsi target0:0:10: handle(0x0021),
sas_addr(0x4433221112000000)
[19995.151537] scsi target0:0:10: enclosure logical
id(0x500062b203842300), slot(8)
[19995.151607] mpt3sas_cm0: search for end-devices: complete
[19995.151609] mpt3sas_cm0: search for end-devices: start
[19995.151611] mpt3sas_cm0: search for PCIe end-devices: complete
[19995.151613] mpt3sas_cm0: search for expanders: start
[19995.151614] mpt3sas_cm0: search for expanders: complete
[19995.151624] mpt3sas_cm0: _base_fault_reset_work: hard reset: success
[19995.151630] mpt3sas_cm0: removing unresponding devices: start
[19995.151631] mpt3sas_cm0: removing unresponding devices: end-devices
[19995.151633] mpt3sas_cm0: Removing unresponding devices: pcie end-devices
[19995.151635] mpt3sas_cm0: removing unresponding devices: expanders
[19995.151636] mpt3sas_cm0: removing unresponding devices: complete
[19995.151642] mpt3sas_cm0: scan devices: start
[19995.152075] mpt3sas_cm0: scan devices: expanders start
[19995.152139] mpt3sas_cm0: break from expander scan:
ioc_status(0x0022), loginfo(0x310f0400)
[19995.152141] mpt3sas_cm0: scan devices: expanders complete
[19995.152142] mpt3sas_cm0: scan devices: end devices start
[19995.156007] mpt3sas_cm0: break from end device scan:
ioc_status(0x0022), loginfo(0x310f0400)
[19995.156009] mpt3sas_cm0: scan devices: end devices complete
[19995.156010] mpt3sas_cm0: scan devices: pcie end devices start
[19995.156028] mpt3sas_cm0: log_info(0x3003011d): originator(IOP),
code(0x03), sub_code(0x011d)
[19995.156047] mpt3sas_cm0: log_info(0x3003011d): originator(IOP),
code(0x03), sub_code(0x011d)
[19995.156053] mpt3sas_cm0: break from pcie end device scan:
ioc_status(0x0021), loginfo(0x3003011d)
[19995.156054] mpt3sas_cm0: pcie devices: pcie end devices complete
[19995.156055] mpt3sas_cm0: scan devices: complete
[19995.650024] sd 0:0:0:0: Power-on or device reset occurred
[19995.650052] sd 0:0:6:0: Power-on or device reset occurred
[19996.239565] sd 0:0:10:0: Power-on or device reset occurred
[19996.341924] sd 0:0:9:0: Power-on or device reset occurred
[19996.650155] sd 0:0:1:0: Power-on or device reset occurred
[19996.650184] sd 0:0:3:0: Power-on or device reset occurred
[19996.650197] sd 0:0:2:0: Power-on or device reset occurred
[19996.650205] sd 0:0:4:0: Power-on or device reset occurred
[19996.650214] sd 0:0:7:0: Power-on or device reset occurred