Hi, Thanks for the regression report, and I'm sorry for your inconvenience. On Tue, Jul 16, 2024 at 02:51:24PM +0000, edmund.raile wrote: > On kernels since 5.14.0, ALSA playback to the FireWire RME Fireface 800 > audio interface results in a deadlock involving snd_pcm_period_elapsed(), > freezing the system. > > On kernels 5.0.0 to 5.13.19 the interface works just fine, as it does with > the RME driver. > > Distributions tested: > Ubuntu > Manjaro > Arch > Fedora > > FireWire chipsets tested: > LSI FW643 > TI XIO2213B > > Platforms tested: > Intel i5 4570 on AsRock H97 Pro4 > Intel i5 12600K on MSI MS-7D25 > > The behavior was also observed on the RME forum: > https://forum.rme-audio.de/viewtopic.php?pid=190472#p190472 > > Shortened traces of 6.10.0-rc7 (Arch linux-mainline): > > RIP: 0010:tasklet_unlock_spin_wait (./arch/x86/include/asm/bitops.h:206 ./arch/x86/include/asm/bitops.h:238 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/softirq.c:851) > <NMI> > ? watchdog_hardlockup_check.cold (kernel/watchdog.c:200) > ? __perf_event_overflow (kernel/events/core.c:9737 (discriminator 2)) > ? handle_pmi_common (arch/x86/events/intel/core.c:3061 (discriminator 1)) > ? intel_pmu_handle_irq (./arch/x86/include/asm/paravirt.h:192 arch/x86/events/intel/core.c:2428 arch/x86/events/intel/core.c:3127) > ? perf_event_nmi_handler (arch/x86/events/core.c:1744 arch/x86/events/core.c:1730) > ? nmi_handle (arch/x86/kernel/nmi.c:151) > ? default_do_nmi (arch/x86/kernel/nmi.c:352 (discriminator 61)) > ? exc_nmi (arch/x86/kernel/nmi.c:546) > ? end_repeat_nmi (arch/x86/entry/entry_64.S:1408) > ? tasklet_unlock_spin_wait (./arch/x86/include/asm/bitops.h:206 ./arch/x86/include/asm/bitops.h:238 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/softirq.c:851) > ? tasklet_unlock_spin_wait (./arch/x86/include/asm/bitops.h:206 ./arch/x86/include/asm/bitops.h:238 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/softirq.c:851) > ? tasklet_unlock_spin_wait (./arch/x86/include/asm/bitops.h:206 ./arch/x86/include/asm/bitops.h:238 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/softirq.c:851) > </NMI> > <TASK> > ohci_flush_iso_completions (./include/linux/interrupt.h:740 drivers/firewire/ohci.c:3530) firewire_ohci > amdtp_domain_stream_pcm_pointer (sound/firewire/amdtp-stream.c:1858) snd_firewire_lib > snd_pcm_update_hw_ptr0 (sound/core/pcm_lib.c:304) snd_pcm > snd_pcm_status64 (sound/core/pcm_native.c:1034) snd_pcm > snd_pcm_status_user64 (./include/linux/uaccess.h:191 sound/core/pcm_native.c:1096) snd_pcm > snd_pcm_ioctl (sound/core/pcm_native.c:3401 (discriminator 1)) snd_pcm > __x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:907 fs/ioctl.c:893 fs/ioctl.c:893) > do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) > ? snd_pcm_status_user64 (sound/core/pcm_native.c:1096 (discriminator 1)) snd_pcm > ? futex_wake (kernel/futex/waitwake.c:173) > ? do_futex (kernel/futex/syscalls.c:107 (discriminator 1)) > ? __x64_sys_futex (kernel/futex/syscalls.c:179 kernel/futex/syscalls.c:160 kernel/futex/syscalls.c:160) > ? syscall_exit_to_user_mode (kernel/entry/common.c:221) > ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:178 arch/x86/entry/common.c:98) > ? do_futex (kernel/futex/syscalls.c:107 (discriminator 1)) > ? __x64_sys_futex (kernel/futex/syscalls.c:179 kernel/futex/syscalls.c:160 kernel/futex/syscalls.c:160) > ? syscall_exit_to_user_mode (kernel/entry/common.c:221) > ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:178 arch/x86/entry/common.c:98) > ? syscall_exit_to_user_mode (kernel/entry/common.c:221) > ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:178 arch/x86/entry/common.c:98) > ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:178 arch/x86/entry/common.c:98) > ? __irq_exit_rcu (kernel/softirq.c:620 (discriminator 1) kernel/softirq.c:639 (discriminator 1)) > entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) > > RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:380 (discriminator 3)) > <NMI> > ? watchdog_hardlockup_check.cold (kernel/watchdog.c:200) > ? __perf_event_overflow (kernel/events/core.c:9737 (discriminator 2)) > ? handle_pmi_common (arch/x86/events/intel/core.c:3061 (discriminator 1)) > ? intel_pmu_handle_irq (./arch/x86/include/asm/paravirt.h:192 arch/x86/events/intel/core.c:2428 arch/x86/events/intel/core.c:3127) > ? perf_event_nmi_handler (arch/x86/events/core.c:1744 arch/x86/events/core.c:1730) > ? nmi_handle (arch/x86/kernel/nmi.c:151) > ? default_do_nmi (arch/x86/kernel/nmi.c:352 (discriminator 61)) > ? exc_nmi (arch/x86/kernel/nmi.c:546) > ? end_repeat_nmi (arch/x86/entry/entry_64.S:1408) > ? native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:380 (discriminator 3)) > ? native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:380 (discriminator 3)) > ? native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:380 (discriminator 3)) > </NMI> > <IRQ> > _raw_spin_lock_irqsave (./arch/x86/include/asm/paravirt.h:584 ./arch/x86/include/asm/qspinlock.h:51 ./include/asm-generic/qspinlock.h:114 ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) > snd_pcm_period_elapsed (sound/core/pcm_lib.c:1905) snd_pcm > process_rx_packets (sound/firewire/amdtp-stream.c:1164) snd_firewire_lib > irq_target_callback (sound/firewire/amdtp-stream.c:1549) snd_firewire_lib > handle_it_packet (drivers/firewire/ohci.c:2786 drivers/firewire/ohci.c:2974) firewire_ohci > context_tasklet (drivers/firewire/ohci.c:1127) firewire_ohci > tasklet_action_common.isra.0 (kernel/softirq.c:789) > handle_softirqs (kernel/softirq.c:554) > __irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637) > common_interrupt (arch/x86/kernel/irq.c:278 (discriminator 35)) > </IRQ> > <TASK> > asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693) > > It can be induced by direct ALSA playback to the device: > mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac > Time to occurrence ranges from two seconds to 30 minutes. > > Loading the CPU appears to increase the likelihood: > stress --cpu $(nproc) > So does switching between applications via workspaces (only tested Xfce). > > The regression has been traced to these two commits: > 7ba5ca32fe6e8d2e153fb5602997336517b34743 > b5b519965c4c364ae65c49fe9f11d222dd70a9c2 > > I am currently testing a simple patch, in essence reverting both commits. > Behaves well so far (stable), will likely send it in tomorrow. > > #regzbot introduced: 7ba5ca32fe6e As long as reading the call trace, the issue is indeed deadlock between the process and softIRQ (tasklet) contexts against the group lock for ALSA PCM substream and the tasklet for OHCI 1394 IT context. A. In the process context * (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) Then attempt to enter tasklet B. In the softIRQ context * (lock B) Enter tasklet * (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed() It is the same issue as you reported in test branch for bh workqueue[1]. I think the users rarely face the issue when working with either PipeWire or PulseAudio, since these processes run with no period wakeup mode of runtime for PCM substream (thus with less hardIRQ). Anyway, it is one of solutions to revert both a commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period update") and a commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context"). The returned workqueue is responsible for lock A, thus: A. In the process context * (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) Then attempt to enter tasklet B. In the softIRQ context * (lock B) Enter tasklet * schedule workqueue C. another process context (workqueue) * (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed() The deadlock would not occur. [1] https://github.com/allenpais/for-6.9-bh-conversions/issues/1 Regards Takashi Sakamoto