On Sat, 07 Sep 2024 12:38:50 +0200, Andrew Cooper wrote: > > On 07/09/2024 8:46 am, Takashi Iwai wrote: > > On Fri, 06 Sep 2024 20:42:09 +0200, > > Ariadne Conill wrote: > >> This patch attempted to work around a DMA issue involving Xen, but > >> causes subtle kernel memory corruption. > >> > >> When I brought up this patch in the XenDevel matrix channel, I was > >> told that it had been requested by the Qubes OS developers because > >> they were trying to fix an issue where the sound stack would fail > >> after a few hours of uptime. They wound up disabling SG buffering > >> entirely instead as a workaround. > >> > >> Accordingly, I propose that we should revert this workaround patch, > >> since it causes kernel memory corruption and that the ALSA and Xen > >> communities should collaborate on fixing the underlying problem in > >> such a way that SG buffering works correctly under Xen. > >> > >> This reverts commit 53466ebdec614f915c691809b0861acecb941e30. > >> > >> Signed-off-by: Ariadne Conill <ariadne@ariadne.space> > >> Cc: stable@xxxxxxxxxxxxxxx > >> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx > >> Cc: alsa-devel@xxxxxxxxxxxxxxxx > >> Cc: Takashi Iwai <tiwai@xxxxxxx> > > The relevant code has been largely rewritten for 6.12, so please check > > the behavior with sound.git tree for-next branch. I guess the same > > issue should happen as the Xen workaround was kept and applied there, > > too, but it has to be checked at first. > > > > If the issue is persistent with there, the fix for 6.12 code would be > > rather much simpler like the blow. > > > > > > thanks, > > > > Takashi > > > > --- a/sound/core/memalloc.c > > +++ b/sound/core/memalloc.c > > @@ -793,9 +793,6 @@ static void *snd_dma_sg_alloc(struct snd_dma_buffer *dmab, size_t size) > > int type = dmab->dev.type; > > void *p; > > > > - if (cpu_feature_enabled(X86_FEATURE_XENPV)) > > - return snd_dma_sg_fallback_alloc(dmab, size); > > - > > /* try the standard DMA API allocation at first */ > > if (type == SNDRV_DMA_TYPE_DEV_WC_SG) > > dmab->dev.type = SNDRV_DMA_TYPE_DEV_WC; > > > > > > Individual subsystems ought not to know or care about XENPV; it's a > layering violation. > > If the main APIs don't behave properly, then it probably means we've got > a bug at a lower level (e.g. Xen SWIOTLB is a constant source of fun) > which is probably affecting other subsystems too. > > I think we need to re-analyse the original bug. Right now, the > behaviour resulting from 53466ebde is worse than what it was trying to fix. I agree with its hackishness and ugliness there; it was to fix the regression at that time. And, it definitely needs a cleaner solution. But the fact is that this change was applied quite some time ago (over a year). It implies that it's no urgent regression that has to be addressed in the next few days for 6.11-final. Meanwhile we have largish code changes for 6.12, hence the revert can't be applied. So I'm asking testing the latest code for 6.12, and work on there. thanks, Takashi