Re: [BUG] pm80xx crashing during SATA discovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 28 Jul 2022, at 20:03, John Garry <john.garry@xxxxxxxxxx> wrote:
> 
>> Hi,
>> I'm having trouble getting the pm80xx driver to discover SATA drives.
>> Enclosure has both SAS and SATA drives and the controller is Adaptec Device 8074 (rev 06) on x86_64 machine.
>> Without ATA support in libsas, SAS drives alone are working correctly on 5.18.
>> After enabling ATA support the driver started crashing.
>> With scsi_mod.scan=sync and high loglevel for pm80xx I was able to perform a couple
>> of commands before the kernel crashed.
> 
> Has this worked ok for any other recent kernel version? It seems that 5.18 and 5.19-rcX are broken for you, but it would be good to know if your setup ever worked without issue. Damien (cc'ed) and I have been making changes to this driver and libsas recently but don't experience these sorts of problems.
This is a new environmen, It did not work before. So far I tested 4.11 and 5.18 and 5.19 with similar crashes around spinlock in mpi_sata_completion

> 
>> Applying "libsas and drivers: NCQ error handling" (https://patchwork.kernel.org/project/linux-scsi/list/?series=662216)
>> series of patches prevents the driver from crashing but all operations are timing out.
> 
> That could be related to patch 1/6, where I think that you might experience a use-after-free (which that patch attempts to fix). Turning KASAN on should help detect (without my patch).


vanilla 5.19-rc8 with KASAN enabled and scsi_mod.scan=sync , you ware right there is a use after free there

[   72.174677] ata8.00: qc timeout (cmd 0x27)
[   72.174692] ata14.00: qc timeout (cmd 0x27)
[   72.184006] ata14.00: failed to read native max address (err_mask=0x4)
[   72.184011] ata14.00: HPA support seems broken, skipping HPA handling
[   72.188588] ==================================================================
[   72.188591] BUG: KASAN: use-after-free in mpi_sata_completion+0x2470/0x2d20 [pm80xx]
[   72.203067] Read of size 8 at addr ffff8884336c4f00 by task ksoftirqd/0/14
[   72.219769]
[   72.219772] CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 5.19.0-rc8 #1
[   72.227483] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   72.227545] ata8.00: failed to read native max address (err_mask=0x4)
[   72.227549] ata8.00: HPA support seems broken, skipping HPA handling
[   72.227552] ata8.00: revalidation failed (errno=-5)
[   72.229110] Call Trace:
[   72.229113]  <TASK>
[   72.276458]  dump_stack_lvl+0x33/0x42
[   72.280555]  print_report.cold.12+0xf5/0x67c
[   72.285331]  ? mpi_sata_completion+0x2470/0x2d20 [pm80xx]
[   72.291412]  kasan_report+0xa5/0x120
[   72.295411]  ? mpi_sata_completion+0x2470/0x2d20 [pm80xx]
[   72.301490]  mpi_sata_completion+0x2470/0x2d20 [pm80xx]
[   72.307368]  ? ttwu_do_wakeup+0x21/0x560
[   72.311753]  ? pm80xx_chip_sata_req+0x1e50/0x1e50 [pm80xx]
[   72.317927]  ? _raw_spin_unlock_irqrestore+0x46/0x70
[   72.323475]  ? _raw_spin_unlock+0x39/0x60
[   72.327955]  ? _raw_spin_lock_irqsave+0x8d/0xe0
[   72.333018]  ? _raw_read_lock_bh+0x40/0x40
[   72.337597]  ? sched_clock_cpu+0x69/0x2a0
[   72.342079]  process_oq+0xe56/0x5bc0 [pm80xx]
[   72.346988]  ? psi_task_switch+0x3e8/0x4a0
[   72.351567]  ? mpi_sata_completion+0x2d20/0x2d20 [pm80xx]
[   72.357636]  ? put_prev_entity+0x4d/0x210
[   72.362118]  ? psi_task_change+0x1f0/0x1f0
[   72.366696]  ? __wake_up_common_lock+0x130/0x130
[   72.371856]  ? __switch_to_asm+0x42/0x70
[   72.376241]  pm80xx_chip_isr+0x63/0x160 [pm80xx]
[   72.381447]  tasklet_action_common.isra.19+0x1ed/0x340
[   72.387192]  __do_softirq+0x199/0x575
[   72.391286]  ? tasklet_kill+0x1b0/0x1b0
[   72.395571]  run_ksoftirqd+0x26/0x30
[   72.399564]  smpboot_thread_fn+0x420/0x6c0
[   72.404143]  ? sort_range+0x20/0x20
[   72.408040]  kthread+0x265/0x300
[   72.411647]  ? kthread_complete_and_exit+0x20/0x20
[   72.416999]  ret_from_fork+0x1f/0x30
[   72.420997]  </TASK>
[   72.423437]
[   72.425100] Allocated by task 198:
[   72.428898]  kasan_save_stack+0x1c/0x40
[   72.433184]  __kasan_slab_alloc+0x6d/0x90
[   72.437663]  kmem_cache_alloc+0x134/0x460
[   72.442144]  sas_alloc_task+0x1d/0xa0 [libsas]
[   72.447127]  sas_ata_qc_issue+0x1a8/0xba0 [libsas]
[   72.452496]  ata_qc_issue+0x498/0xbb0
[   72.456588]  ata_exec_internal_sg+0xafb/0x17f0
[   72.461550]  ata_read_native_max_address+0x1b3/0x480
[   72.467097]  ata_dev_configure+0x1ba7/0x5180
[   72.471867]  ata_eh_recover+0x15b5/0x3030
[   72.476346]  ata_do_eh+0x40/0xb0
[   72.479952]  ata_scsi_port_error_handler+0x47d/0xb90
[   72.485496]  async_sas_ata_eh+0xcf/0x112 [libsas]
[   72.490768]  async_run_entry_fn+0x93/0x510
[   72.495344]  process_one_work+0x889/0x14c0
[   72.499919]  worker_thread+0x84/0x11f0
[   72.504107]  kthread+0x265/0x300
[   72.507712]  ret_from_fork+0x1f/0x30
[   72.511705]
[   72.513367] Freed by task 198:
[   72.516776]  kasan_save_stack+0x1c/0x40
[   72.521060]  kasan_set_track+0x21/0x30
[   72.525247]  kasan_set_free_info+0x20/0x30
[   72.529822]  __kasan_slab_free+0x101/0x170
[   72.534398]  kmem_cache_free+0xa1/0x4b0
[   72.538682]  ata_exec_internal_sg+0xc72/0x17f0
[   72.543646]  ata_read_native_max_address+0x1b3/0x480
[   72.549192]  ata_dev_configure+0x1ba7/0x5180
[   72.553961]  ata_eh_recover+0x15b5/0x3030
[   72.558440]  ata_do_eh+0x40/0xb0
[   72.562045]  ata_scsi_port_error_handler+0x47d/0xb90
[   72.567590]  async_sas_ata_eh+0xcf/0x112 [libsas]
[   72.572861]  async_run_entry_fn+0x93/0x510
[   72.577436]  process_one_work+0x889/0x14c0
[   72.582012]  worker_thread+0x84/0x11f0
[   72.586199]  kthread+0x265/0x300
[   72.589803]  ret_from_fork+0x1f/0x30
[   72.593798]
[   72.595461] The buggy address belongs to the object at ffff8884336c4f00
[   72.595461]  which belongs to the cache sas_task of size 264
[   72.609152] The buggy address is located 0 bytes inside of
[   72.609152]  264-byte region [ffff8884336c4f00, ffff8884336c5008)
[   72.622067]
[   72.623729] The buggy address belongs to the physical page:
[   72.629950] page:000000001e6555bf refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4336c4
[   72.640445] head:000000001e6555bf order:1 compound_mapcount:0 compound_pincount:0
[   72.648801] flags: 0xbfff80000010200(slab|head|node=0|zone=2|lastcpupid=0x7fff)
[   72.656971] raw: 0bfff80000010200 0000000000000000 dead000000000122 ffff888106682dc0
[   72.665620] raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
[   72.674265] page dumped because: kasan: bad access detected
[   72.680486]
[   72.682149] Memory state around the buggy address:
[   72.687499]  ffff8884336c4e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   72.695565]  ffff8884336c4e80: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   72.703630] >ffff8884336c4f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   72.711694]                    ^
[   72.715297]  ffff8884336c4f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   72.723352]  ffff8884336c5000: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   72.731407] ==================================================================
[   72.739460] Disabling lock debugging due to kernel taint
[   79.854640] ata8.00: qc timeout (cmd 0xec)
[   79.867706] sas: Internal abort: task to dev 500e004abbbbbb02 response: 0x5 status 0x0
[   79.883327] sas: Internal abort: task to dev 500e004abbbbbb02 response: 0x5 status 0x0
[   79.898953] sas: Internal abort: task to dev 500e004abbbbbb02 response: 0x5 status 0x0
[   79.907810] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   79.914618] ata8.00: revalidation failed (errno=-5)
[   81.902636] ata9.00: qc timeout (cmd 0x2f)
[   81.907295] ata9.00: Read log 0x00 page 0x00 failed, Emask 0x4
[   81.913829] ata9.00: failed to set xfermode (err_mask=0x40)
[   89.582637] ata9.00: qc timeout (cmd 0xec)
[   89.582637] ata14.00: qc timeout (cmd 0x2f)
[   89.591934] sas: Internal abort: task to dev 500e004abbbbbb05 response: 0x5 status 0x0
[   89.602140] sas: Internal abort: task to dev 500e004abbbbbb05 response: 0x5 status 0x0
[   89.602153] sas: Internal abort: task to dev 500e004abbbbbb0f response: 0x5 status 0x0
[   89.619835] sas: Internal abort: task to dev 500e004abbbbbb05 response: 0x5 status 0x0
[   89.619859] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   89.635491] ata9.00: revalidation failed (errno=-5)
[   89.635516] sas: Internal abort: task to dev 500e004abbbbbb0f response: 0x5 status 0x0
[   89.650912] sas: Internal abort: task to dev 500e004abbbbbb0f response: 0x5 status 0x0
[   89.659791] ata14.00: Read log 0x13 page 0x00 failed, Emask 0x4
[   89.666407] ata14.00: Read log 0x00 page 0x00 failed, Emask 0x40
[   89.673120] ata14.00: NCQ Send/Recv Log not supported
[   89.678770] ata14.00: Read log 0x00 page 0x00 failed, Emask 0x40
[   89.685498] ata14.00: failed to set xfermode (err_mask=0x40)
[   89.691823] ata14.00: limiting speed to UDMA/133:PIO3
[   91.323501] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
[   91.323504] Modules linked in: pm80xx(+) libsas scsi_transport_sas loop tg3 virtio_rng virtio_net net_failover failover
[   91.323526] CPU: 0 PID: 14 Comm: ksoftirqd/0 Tainted: G    B             5.19.0-rc8 #1
[   91.323535] RIP: 0010:kasan_report+0x1b/0x120
[   91.323542] Code: aa 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 56 41 55 41 54 55 53 48 83 ec 38 9c 41 5d 0f 01 ca 65 48 8b 04 25 00 ec 02 00 <8b> 80 20 0b 00 00 85 c0 0f 85 92 00 00 00 48 8b9
[   91.323547] RSP: 0018:ffffc9000012f980 EFLAGS: 00000082
[   91.323552] RAX: ffff8881003dec00 RBX: ffff8884336c4f08 RCX: ffffffff812b798b
[   91.323556] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8884336c4f08
[   91.323560] RBP: 0000000000000003 R08: fffffffffffffffb R09: ffffed10866d89e1
[   91.323563] R10: 0000000000000003 R11: ffffed10866d89e1 R12: 1ffff92000025f3f
[   91.323566] R13: 0000000000000082 R14: ffff8884336c4f6c R15: ffff8884336c4f70
[   91.323570] FS:  0000000000000000(0000) GS:ffff88838b000000(0000) knlGS:0000000000000000
[   91.323575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   91.323578] CR2: 000000c0002a9ff8 CR3: 0000000432ed2006 CR4: 00000000003706f0
[   91.323582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   91.323585] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   91.323589] Call Trace:
[   91.323591]  <TASK>
[   91.323596]  queued_spin_lock_slowpath+0x8fb/0xa50
[   91.323604]  ? osq_unlock+0x1f0/0x1f0
[   91.323610]  ? irq_work_claim+0x16/0x70
[   91.323617]  ? irq_work_queue+0xd/0x50
[   91.323623]  ? __wake_up_klogd.part.32+0x79/0xb0
[   91.323629]  ? vprintk_emit+0xf7/0x2b0
[   91.323633]  ? print_report.cold.12+0x66c/0x67c
[   91.323642]  _raw_spin_lock_irqsave+0xd9/0xe0
[   91.323649]  ? _raw_read_lock_bh+0x40/0x40
[   91.323658]  mpi_sata_completion+0x389/0x2d20 [pm80xx]
[   91.323717]  ? ttwu_do_wakeup+0x21/0x560
[   91.323723]  ? pm80xx_chip_sata_req+0x1e50/0x1e50 [pm80xx]
[   91.323773]  ? _raw_spin_unlock+0x39/0x60
[   91.323780]  ? _raw_spin_lock_irqsave+0x8d/0xe0
[   91.323786]  ? _raw_read_lock_bh+0x40/0x40
[   91.323793]  ? sched_clock_cpu+0x69/0x2a0
[   91.323799]  process_oq+0xe56/0x5bc0 [pm80xx]
[   91.323854]  ? psi_task_switch+0x3e8/0x4a0
[   91.323861]  ? mpi_sata_completion+0x2d20/0x2d20 [pm80xx]
[   91.323911]  ? put_prev_entity+0x4d/0x210
[   91.323918]  ? psi_task_change+0x1f0/0x1f0
[   91.323924]  ? __wake_up_common_lock+0x130/0x130
[   91.323930]  ? __switch_to_asm+0x42/0x70
[   91.323937]  pm80xx_chip_isr+0x63/0x160 [pm80xx]
[   91.323988]  tasklet_action_common.isra.19+0x1ed/0x340
[   91.323996]  __do_softirq+0x199/0x575
[   91.324002]  ? tasklet_kill+0x1b0/0x1b0
[   91.324008]  run_ksoftirqd+0x26/0x30
[   91.324013]  smpboot_thread_fn+0x420/0x6c0
[   91.324020]  ? sort_range+0x20/0x20
[   91.324027]  kthread+0x265/0x300
[   91.324031]  ? kthread_complete_and_exit+0x20/0x20
[   91.324037]  ret_from_fork+0x1f/0x30
[   91.324046]  </TASK>
[   91.324049] Kernel panic - not syncing: Hard LOCKUP
[   91.324052] CPU: 0 PID: 14 Comm: ksoftirqd/0 Tainted: G    B             5.19.0-rc8 #1
[   91.324058] Hardware name: EMC InfinityES6/110-365-300J-00, BIOS 33.15 08/17/2017
[   91.324060] Call Trace:
[   91.324062]  <NMI>
[   91.324064]  dump_stack_lvl+0x33/0x42
[   91.324072]  panic+0x1d4/0x3f7
[   91.324078]  ? panic_print_sys_info+0x75/0x75
[   91.324085]  ? nmi_panic+0x27/0x60
[   91.324093]  nmi_panic.cold.7+0x14/0x14
[   91.324098]  watchdog_overflow_callback.cold.3+0xcf/0x103
[   91.324104]  __perf_event_overflow+0x11c/0x370
[   91.324113]  handle_pmi_common+0x49a/0x6d0
[   91.324122]  ? intel_pmu_save_and_restart+0xe0/0xe0
[   91.324130]  ? __native_set_fixmap+0x24/0x30
[   91.324138]  ? memcpy_fromio+0x26/0x110
[   91.324147]  ? ghes_copy_tofrom_phys+0x90/0x160
[   91.324158]  intel_pmu_handle_irq+0x269/0xbe0
[   91.324167]  perf_event_nmi_handler+0x37/0x50
[   91.324174]  nmi_handle+0xa5/0x260
[   91.324183]  default_do_nmi+0x72/0x170
[   91.324191]  exc_nmi+0x117/0x140
[   91.324197]  end_repeat_nmi+0x16/0x31
[   91.324202] RIP: 0010:kasan_report+0x1b/0x120
[   91.324207] Code: aa 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 56 41 55 41 54 55 53 48 83 ec 38 9c 41 5d 0f 01 ca 65 48 8b 04 25 00 ec 02 00 <8b> 80 20 0b 00 00 85 c0 0f 85 92 00 00 00 48 8b9
[   91.324212] RSP: 0018:ffffc9000012f980 EFLAGS: 00000082
[   91.324216] RAX: ffff8881003dec00 RBX: ffff8884336c4f08 RCX: ffffffff812b798b
[   91.324220] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8884336c4f08
[   91.324224] RBP: 0000000000000003 R08: fffffffffffffffb R09: ffffed10866d89e1
[   91.324227] R10: 0000000000000003 R11: ffffed10866d89e1 R12: 1ffff92000025f3f
[   91.324230] R13: 0000000000000082 R14: ffff8884336c4f6c R15: ffff8884336c4f70
[   91.324236]  ? queued_spin_lock_slowpath+0x8fb/0xa50
[   91.324244]  ? kasan_report+0x1b/0x120
[   91.324250]  ? kasan_report+0x1b/0x120
[   91.324255]  </NMI>
[   91.324257]  <TASK>
[   91.324261]  queued_spin_lock_slowpath+0x8fb/0xa50
[   91.324267]  ? osq_unlock+0x1f0/0x1f0
[   91.324273]  ? irq_work_claim+0x16/0x70
[   91.324279]  ? irq_work_queue+0xd/0x50
[   91.324284]  ? __wake_up_klogd.part.32+0x79/0xb0
[   91.324289]  ? vprintk_emit+0xf7/0x2b0
[   91.324294]  ? print_report.cold.12+0x66c/0x67c
[   91.324302]  _raw_spin_lock_irqsave+0xd9/0xe0
[   91.324309]  ? _raw_read_lock_bh+0x40/0x40
[   91.324317]  mpi_sata_completion+0x389/0x2d20 [pm80xx]
[   91.324371]  ? ttwu_do_wakeup+0x21/0x560
[   91.324376]  ? pm80xx_chip_sata_req+0x1e50/0x1e50 [pm80xx]
[   91.324427]  ? _raw_spin_unlock+0x39/0x60
[   91.324433]  ? _raw_spin_lock_irqsave+0x8d/0xe0
[   91.324440]  ? _raw_read_lock_bh+0x40/0x40
[   91.324446]  ? sched_clock_cpu+0x69/0x2a0
[   91.324452]  process_oq+0xe56/0x5bc0 [pm80xx]
[   91.324507]  ? psi_task_switch+0x3e8/0x4a0
[   91.324514]  ? mpi_sata_completion+0x2d20/0x2d20 [pm80xx]
[   91.324565]  ? put_prev_entity+0x4d/0x210
[   91.324571]  ? psi_task_change+0x1f0/0x1f0
[   91.324578]  ? __wake_up_common_lock+0x130/0x130
[   91.324583]  ? __switch_to_asm+0x42/0x70
[   91.324590]  pm80xx_chip_isr+0x63/0x160 [pm80xx]
[   91.324644]  tasklet_action_common.isra.19+0x1ed/0x340
[   91.324651]  __do_softirq+0x199/0x575
[   91.324658]  ? tasklet_kill+0x1b0/0x1b0
[   91.324663]  run_ksoftirqd+0x26/0x30
[   91.324668]  smpboot_thread_fn+0x420/0x6c0
[   91.324676]  ? sort_range+0x20/0x20
[   91.324682]  kthread+0x265/0x300
[   91.324686]  ? kthread_complete_and_exit+0x20/0x20
[   91.324692]  ret_from_fork+0x1f/0x30
[   91.324701]  </TASK>
[   91.324708] Kernel Offset: disabled

Thanks,
Michal



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux