On Mon, Apr 5, 2021 at 9:00 PM Martin K. Petersen <martin.petersen@xxxxxxxxxx> wrote: > > > Joe, > > > Noticed commit ffdadd68af5a ("scsi: mpt3sas: disable ASPM for MPI2 > > controllers") disables ASPM for SAS-2.0 HBAs, but this change was not > > replicated for SAS-3.0 HBAs. This change replicates this behavior. > > Do you have a system that exhibits problems with ASPM enabled? I am not sure. I get intermittent messages in dmesg as seen below and stumbled upon commit ffdadd68af5a while researching, which looked similar. I haven't found a way to easily or reliably reproduce this issue, but it surfaces as dmesg reporting an unknown NMI, and all the disks suddenly going offline. There is some sort of controller fault occurring because of the dmesg line which says "mpt3sas_cm0: _base_fault_reset_work: Running mpt3sas_dead_ioc thread success." My naive thought process was that: - A message from Sreekanth back in ~2016 suggested that it should be disabled explicitly for SAS-2.0 [1] - perhaps this is also true for SAS-3.0 ? - Not sure, but disabling ASPM for SAS-3.0 probably wouldn't negatively impact users - Disabling ASPM explicitly in the driver only has an impact if the BIOS has given kernel control of ASPM, but could be a good safeguard. - It may (or may not) reduce the incidence of this event I sporadically see. Is there a way to induce ASPM events so that I could test this? Or perhaps can I tweak the fault handler to get more information about the specific type of fault? All in all I figured the change was relatively harmless and could reduce the incidence of this sporadic NMI I see. Thanks, Joe [1]: https://patchwork.kernel.org/project/linux-scsi/patch/20161228110524.7516-1-ojab@xxxxxxx/#20106435 1513141.713575] Uhhuh. NMI received for unknown reason 30 on CPU 0. [1513141.713576] Do you have a strange power saving mode enabled? [1513141.713577] Dazed and confused, but trying to continue [1513141.839140] mpt3sas_cm0: SAS host is non-operational !!!! [1513142.867056] mpt3sas_cm0: SAS host is non-operational !!!! [1513143.890996] mpt3sas_cm0: SAS host is non-operational !!!! [1513144.914887] mpt3sas_cm0: SAS host is non-operational !!!! [1513145.934806] mpt3sas_cm0: SAS host is non-operational !!!! [1513146.958724] mpt3sas_cm0: SAS host is non-operational !!!! [1513146.965053] mpt3sas_cm0: _base_fault_reset_work: Running mpt3sas_dead_ioc thread success !!!! [1513146.965423] sd 0:0:7:0: [sdh] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [1513146.973762] sd 0:0:7:0: [sdh] tag#0 CDB: Read(10) 28 00 d7 72 30 b0 00 00 10 00 [1513146.973764] print_req_error: I/O error, dev sdh, sector 3614585008 [1513146.978754] sd 0:0:6:0: [sdg] tag#29 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [1513146.978756] sd 0:0:6:0: [sdg] tag#9 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [1513146.978757] sd 0:0:6:0: [sdg] tag#33 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [1513146.978759] sd 0:0:6:0: [sdg] tag#33 CDB: Read(10) 28 00 d8 47 30 68 00 00 30 00 [1513146.978760] sd 0:0:6:0: [sdg] tag#9 CDB: Write(10) 2a 00 61 d1 ae 20 00 04 00 00