Re: [PATCH] scsi: mpt3sas: disable ASPM for mpt3sas / SAS3.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 5, 2021 at 9:00 PM Martin K. Petersen
<martin.petersen@xxxxxxxxxx> wrote:
>
>
> Joe,
>
> > Noticed commit ffdadd68af5a ("scsi: mpt3sas: disable ASPM for MPI2
> > controllers") disables ASPM for SAS-2.0 HBAs, but this change was not
> > replicated for SAS-3.0 HBAs. This change replicates this behavior.
>
> Do you have a system that exhibits problems with ASPM enabled?

I am not sure.

I get intermittent messages in dmesg as seen below and stumbled upon
commit ffdadd68af5a while researching, which looked similar.

I haven't found a way to easily or reliably reproduce this issue, but
it surfaces as dmesg reporting an unknown NMI, and all the disks
suddenly going offline. There is some sort of controller fault
occurring because of the dmesg line which says "mpt3sas_cm0:
_base_fault_reset_work: Running mpt3sas_dead_ioc thread success."

My naive thought process was that:

- A message from Sreekanth back in ~2016 suggested that it should be
disabled explicitly for SAS-2.0 [1] - perhaps this is also true for
SAS-3.0 ?
- Not sure, but disabling ASPM for SAS-3.0 probably wouldn't
negatively impact users
- Disabling ASPM explicitly in the driver only has an impact if the
BIOS has given kernel control of ASPM, but could be a good safeguard.
- It may (or may not) reduce the incidence of this event I sporadically see.

Is there a way to induce ASPM events so that I could test this? Or
perhaps can I tweak the fault handler to get more information about
the specific type of fault?

All in all I figured the change was relatively harmless and could
reduce the incidence of this sporadic NMI I see.

Thanks,
Joe

[1]: https://patchwork.kernel.org/project/linux-scsi/patch/20161228110524.7516-1-ojab@xxxxxxx/#20106435

1513141.713575] Uhhuh. NMI received for unknown reason 30 on CPU 0.
[1513141.713576] Do you have a strange power saving mode enabled?
[1513141.713577] Dazed and confused, but trying to continue
[1513141.839140] mpt3sas_cm0: SAS host is non-operational !!!!
[1513142.867056] mpt3sas_cm0: SAS host is non-operational !!!!
[1513143.890996] mpt3sas_cm0: SAS host is non-operational !!!!
[1513144.914887] mpt3sas_cm0: SAS host is non-operational !!!!
[1513145.934806] mpt3sas_cm0: SAS host is non-operational !!!!
[1513146.958724] mpt3sas_cm0: SAS host is non-operational !!!!
[1513146.965053] mpt3sas_cm0: _base_fault_reset_work: Running
mpt3sas_dead_ioc thread success !!!!
[1513146.965423] sd 0:0:7:0: [sdh] tag#0 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.973762] sd 0:0:7:0: [sdh] tag#0 CDB: Read(10) 28 00 d7 72 30
b0 00 00 10 00
[1513146.973764] print_req_error: I/O error, dev sdh, sector 3614585008
[1513146.978754] sd 0:0:6:0: [sdg] tag#29 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978756] sd 0:0:6:0: [sdg] tag#9 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978757] sd 0:0:6:0: [sdg] tag#33 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978759] sd 0:0:6:0: [sdg] tag#33 CDB: Read(10) 28 00 d8 47 30
68 00 00 30 00
[1513146.978760] sd 0:0:6:0: [sdg] tag#9 CDB: Write(10) 2a 00 61 d1 ae
20 00 04 00 00



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux