Re: Fwd: Marvell 88SE6320 SAS controller (mvsas) cannot survive ACPI S3 or ACPI S4

Bagas Sanjaya <bagasdotme@xxxxxxxxx> · Thu, 26 Oct 2023 21:12:30 +0700

On Thu, Oct 26, 2023 at 05:56:03PM +0900, Damien Le Moal wrote:
> On 2023/10/26 17:25, Bagas Sanjaya wrote:
> > Hi,
> > 
> > I notice a bug report on Bugzilla [1]. Quoting from it:
> 
> [...]
> 
> >> [  437.249448] PM: suspend entry (deep)
> >> [  437.255308] Filesystems sync: 0.005 seconds
> >> [  437.255570] Freezing user space processes
> >> [  437.257093] Freezing user space processes completed (elapsed 0.001 seconds)
> >> [  437.257097] OOM killer disabled.
> >> [  437.257098] Freezing remaining freezable tasks
> >> [  437.258226] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
> >> [  437.258281] printk: Suspending console(s) (use no_console_suspend to debug)
> >> [  437.291778] sd 0:0:0:0: [sdb] Synchronizing SCSI cache
> >> [  437.291825] sd 0:0:1:0: [sdc] Synchronizing SCSI cache
> >> [  437.292083] sd 0:0:0:0: [sdb] Stopping disk
> >> [  437.292083] sd 0:0:1:0: [sdc] Stopping disk
> >> [  438.363660] sd 1:0:0:0: [sda] Synchronizing SCSI cache
> >> [  438.363760] sd 1:0:0:0: [sda] Stopping disk
> 
> Given this message, this does not look like the latest kernel.
> 
> >> [  589.081341] drivers/scsi/mvsas/mv_sas.c 1304:mvs_I_T_nexus_reset for device[1]:rc= 0
> >> [  610.481270] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >> [  610.481280] rcu: 	11-...0: (0 ticks this GP) idle=4f84/1/0x4000000000000000 softirq=19873/19873 fqs=1159
> >> [  610.481292] 	(detected by 5, t=5252 jiffies, g=53581, q=31630 ncpus=12)
> >> [  610.481299] Sending NMI from CPU 5 to CPUs 11:
> >> [  610.481309] NMI backtrace for cpu 11
> >> [  610.481312] CPU: 11 PID: 3152 Comm: kworker/u32:59 Tainted: G          I        6.1.57-vanilla #14
> >> [  610.481318] Hardware name: System manufacturer System Product Name/P6T WS PRO, BIOS 1205    09/24/2010
> >> [  610.481321] Workqueue: events_unbound async_run_entry_fn
> >> [  610.481329] RIP: 0010:mvs_int_rx+0x81/0x150 [mvsas]
> >> [  610.481346] Code: 00 00 44 39 75 70 74 47 48 8b 45 60 45 89 e6 41 81 e6 ff 03 00 00 41 8d 56 01 8b 1c 90 49 89 d4 41 89 df 41 81 e7 00 00 08 00 <f7> c3 00 00 01 00 74 58 31 d2 89 de 48 89 ef e8 0b f9 ff ff 45 85
> >> [  610.481350] RSP: 0018:ffffb61f06acbb60 EFLAGS: 00000046
> >> [  610.481354] RAX: ffff9a7cc2658000 RBX: 0000000000010000 RCX: 0000000000000000
> >> [  610.481358] RDX: 000000000000026e RSI: 0000000000010000 RDI: ffff9a7ce2660000
> >> [  610.481361] RBP: ffff9a7ce2660000 R08: ffff9a7ce2660f00 R09: ffff9a7ce2660000
> >> [  610.481364] R10: ffff9a7ce26600c8 R11: ffffffff884d4300 R12: 000000000000026e
> >> [  610.481367] R13: 0000000000000000 R14: 000000000000026d R15: 0000000000000000
> >> [  610.481371] FS:  0000000000000000(0000) GS:ffff9a7df7cc0000(0000) knlGS:0000000000000000
> >> [  610.481375] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [  610.481378] CR2: 0000563633425300 CR3: 0000000077210006 CR4: 00000000000206e0
> >> [  610.481382] Call Trace:
> >> [  610.481385]  <NMI>
> >> [  610.481389]  ? nmi_cpu_backtrace.cold+0x1b/0x76
> >> [  610.481398]  ? nmi_cpu_backtrace_handler+0xd/0x20
> >> [  610.481403]  ? nmi_handle+0x5d/0x120
> >> [  610.481410]  ? mvs_int_rx+0x81/0x150 [mvsas]
> >> [  610.481423]  ? default_do_nmi+0x69/0x170
> >> [  610.481428]  ? exc_nmi+0x13c/0x170
> >> [  610.481432]  ? end_repeat_nmi+0x16/0x67
> >> [  610.481443]  ? mvs_int_rx+0x81/0x150 [mvsas]
> >> [  610.481457]  ? mvs_int_rx+0x81/0x150 [mvsas]
> >> [  610.481470]  ? mvs_int_rx+0x81/0x150 [mvsas]
> >> [  610.481483]  </NMI>
> >> [  610.481484]  <TASK>
> >> [  610.481487]  mvs_do_release_task+0x3f/0x90 [mvsas]
> >> [  610.481501]  mvs_release_task+0x13e/0x1a0 [mvsas]
> >> [  610.481516]  mvs_I_T_nexus_reset+0xb2/0xd0 [mvsas]
> >> [  610.481530]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [  610.481552]  sas_ata_hard_reset+0x48/0x80 [libsas]
> >> [  610.481575]  ata_eh_reset+0x2e5/0x1090 [libata]
> >> [  610.481631]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [  610.481652]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [  610.481676]  ata_eh_recover+0x2e6/0xe00 [libata]
> >> [  610.481728]  ? __wake_up_klogd.part.0+0x56/0x80
> >> [  610.481735]  ? vprintk_emit+0x207/0x290
> >> [  610.481739]  ? smp_ata_check_ready_type+0xb0/0xb0 [libsas]
> >> [  610.481760]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [  610.481783]  ? smp_ata_check_ready_type+0xb0/0xb0 [libsas]
> >> [  610.481804]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
> >> [  610.481824]  ata_do_eh+0x75/0xf0 [libata]
> >> [  610.481876]  ? del_timer_sync+0x6f/0xb0
> >> [  610.481884]  ata_scsi_port_error_handler+0x3a8/0x800 [libata]
> >> [  610.481938]  async_sas_ata_eh+0x44/0x7f [libsas]
> >> [  610.481960]  async_run_entry_fn+0x30/0x130
> >> [  610.481966]  process_one_work+0x1c7/0x380
> >> [  610.481974]  worker_thread+0x4d/0x380
> >> [  610.481981]  ? rescuer_thread+0x3a0/0x3a0
> >> [  610.481987]  kthread+0xe9/0x110
> >> [  610.481992]  ? kthread_complete_and_exit+0x20/0x20
> >> [  610.481999]  ret_from_fork+0x22/0x30
> >> [  610.482009]  </TASK>
> >> [  665.286198] NMI watchdog: Watchdog detected hard LOCKUP on cpu 11
> Could be due to the libata deadlock without the recent suspend/resume fixes. Or
> this is yet another adapter that was not tested for suspend/resume. mpt3sas
> crashes the machine 100% of the time as well. I had no time to dig into that issue.
> 

The reporter on Bugzilla [1] said:

> Hello again,
> 6.6rc7 was unable to resume disks from s3 as expected.
> Basically mvsas does not resume the attached devices at all.
> The suspend/resume logic was never implemented and nothing happens on resume.

It looks like mvsas driver doesn't have S3/S4 logic at all, right?

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218030#add_comment

-- 
An old man doll... just what I always wanted! - Clara
Attachment:
signature.asc

Description: PGP signature