Commit b5b519965c4c ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
With RME Fireface 800, this lead to a regression since Kernels 5.14.0,
causing a deadlock with eventual system freeze under ALSA operation:
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
Restore the work queue to prevent deadlock between ALSA substream
process context spin_lock of snd_pcm_stream_lock_irq() in snd_pcm_status64()
and OHCI 1394 IT softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
in snd_pcm_period_elapsed().
to reproduce the issue:
direct ALSA playback to the device:
mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
- high CPU load
stress --cpu $(nproc)
- switching between applications via workspaces
tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.
Closes: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context")
Signed-off-by: Edmund Raile <edmund.raile@xxxxxxxxx>
---
This is the follow-up patch to the 5.14.0 regression I reported:
https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with Fireface 800")
Takashi Sakamoto explained the issue in his response to the regression:
A. In the process context
* (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) Then attempt to enter tasklet
B. In the softIRQ context
* (lock B) Enter tasklet
* (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed()
This leads me to believe this isn't just an issue limited to the RME Fireface
driver (snd_fireface), though I can not test the other devices.
sound/firewire/amdtp-stream.c | 32 +++++++++++++++++++++-----------
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index d35d0a420ee0..77b99a2117f4 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -77,6 +77,8 @@
// overrun. Actual device can skip more, then this module stops the packet streaming.
#define IR_JUMBO_PAYLOAD_MAX_SKIP_CYCLES 5
+static void pcm_period_work(struct work_struct *work);
+
/**
* amdtp_stream_init - initialize an AMDTP stream structure
* @s: the AMDTP stream to initialize
@@ -105,6 +107,7 @@ int amdtp_stream_init(struct amdtp_stream *s, struct fw_unit *unit,
s->flags = flags;
s->context = ERR_PTR(-1);
mutex_init(&s->mutex);
+ INIT_WORK(&s->period_work, pcm_period_work);
s->packet_index = 0;
init_waitqueue_head(&s->ready_wait);
@@ -347,6 +350,7 @@ EXPORT_SYMBOL(amdtp_stream_get_max_payload);
*/
void amdtp_stream_pcm_prepare(struct amdtp_stream *s)
{
+ cancel_work_sync(&s->period_work);
s->pcm_buffer_pointer = 0;
s->pcm_period_pointer = 0;
}
@@ -611,19 +615,21 @@ static void update_pcm_pointers(struct amdtp_stream *s,
// The program in user process should periodically check the status of intermediate
// buffer associated to PCM substream to process PCM frames in the buffer, instead
// of receiving notification of period elapsed by poll wait.
- if (!pcm->runtime->no_period_wakeup) {
- if (in_softirq()) {
- // In software IRQ context for 1394 OHCI.
- snd_pcm_period_elapsed(pcm);
- } else {
- // In process context of ALSA PCM application under acquired lock of
- // PCM substream.
- snd_pcm_period_elapsed_under_stream_lock(pcm);
- }
- }
+ if (!pcm->runtime->no_period_wakeup)
+ queue_work(system_highpri_wq, &s->period_work);
}
}
+static void pcm_period_work(struct work_struct *work)
+{
+ struct amdtp_stream *s = container_of(work, struct amdtp_stream,
+ period_work);
+ struct snd_pcm_substream *pcm = READ_ONCE(s->pcm);
+
+ if (pcm)
+ snd_pcm_period_elapsed(pcm);
+}
+
static int queue_packet(struct amdtp_stream *s, struct fw_iso_packet *params,
bool sched_irq)
{
@@ -1854,7 +1860,10 @@ unsigned long amdtp_domain_stream_pcm_pointer(struct amdtp_domain *d,
if (irq_target && amdtp_stream_running(irq_target)) {
// In software IRQ context, the call causes dead-lock to disable the tasklet
// synchronously.
- if (!in_softirq())
+ // workqueue serves to prevent deadlock between process context spinlock
+ // of snd_pcm_stream_lock_irq() in snd_pcm_status64() and softIRQ spinlock
+ // of snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed()
+ if ((!in_softirq()) && (current_work() != &s->period_work))
fw_iso_context_flush_completions(irq_target->context);
}
@@ -1910,6 +1919,7 @@ static void amdtp_stream_stop(struct amdtp_stream *s)
return;
}
+ cancel_work_sync(&s->period_work);
fw_iso_context_stop(s->context);
fw_iso_context_destroy(s->context);
s->context = ERR_PTR(-1);
diff --git a/sound/firewire/amdtp-stream.h b/sound/firewire/amdtp-stream.h
index a1ed2e80f91a..775db3fc4959 100644
--- a/sound/firewire/amdtp-stream.h
+++ b/sound/firewire/amdtp-stream.h
@@ -191,6 +191,7 @@ struct amdtp_stream {
/* For a PCM substream processing. */
struct snd_pcm_substream *pcm;
+ struct work_struct period_work;
snd_pcm_uframes_t pcm_buffer_pointer;
unsigned int pcm_period_pointer;
unsigned int pcm_frame_multiplier;
--
2.45.2
[Index of Archives]
[Pulseaudio]
[Linux Audio Users]
[ALSA Devel]
[Fedora Desktop]
[Fedora SELinux]
[Big List of Linux Books]
[Yosemite News]
[KDE Users]