On Fri, 09 Dec 2022 02:27:30 +0100, Marek Marczykowski-Górecki wrote: > > Hi, > > Under Xen PV dom0, with Linux >= 5.17, sound stops working after few > hours. pavucontrol still shows meter bars moving, but the speakers > remain silent. At least on some occasions I see the following message in > dmesg: > > [ 2142.484553] snd_hda_intel 0000:00:1f.3: Unstable LPIB (18144 >= 6396); disabling LPIB delay counting > > I'm not sure if that happens before sound stops working, after, of if > it's related at all, but that's pretty much the only sound-related error > I found in logs. > When the issue happens, on rare occasions it starts working again later > for a short time, but generally the fix is to reboot. Reloading all > snd_* modules (surprisingly) do not help. I don't know what exactly > triggers the issue, sometimes is happen after short time like 15 minutes > uptime, but usually after several hours. I guess it depends on usage > pattern, but I haven't spotted any specific relation. > > I managed to bisect it to this commit: > > 2c95b92ecd92e784785b1db8cccc4f0f2bfa850c is the first bad commit > commit 2c95b92ecd92e784785b1db8cccc4f0f2bfa850c > Author: Takashi Iwai <tiwai@xxxxxxx> > Date: Tue Nov 16 08:33:58 2021 +0100 > > ALSA: memalloc: Unify x86 SG-buffer handling (take#3) > > This is a second attempt to unify the x86-specific SG-buffer handling > code with the new standard non-contiguous page handler. > > The first try (in commit 2d9ea39917a4) failed due to the wrong page > and address calculations, hence reverted. (And the second try failed > due to a copy&paste error.) Now it's corrected with the previous fix > for noncontig pages, and the proper sg page iteration by this patch. > > After the migration, SNDRV_DMA_TYPE_DMA_SG becomes identical with > SNDRV_DMA_TYPE_NONCONTIG on x86, while others still fall back to > SNDRV_DMA_TYPE_DEV. > > Tested-by: Alex Xu (Hello71) <alex_y_xu@xxxxxxxx> > Tested-by: Harald Arnesen <harald@xxxxxxxxxxx> > Link: https://lore.kernel.org/r/20211017074859.24112-4-tiwai@xxxxxxx > Link: https://lore.kernel.org/r/20211109062235.22310-1-tiwai@xxxxxxx > Link: https://lore.kernel.org/r/20211116073358.19741-1-tiwai@xxxxxxx > Signed-off-by: Takashi Iwai <tiwai@xxxxxxx> > > include/sound/memalloc.h | 14 ++-- > sound/core/Makefile | 1 - > sound/core/memalloc.c | 53 ++++++++++++- > sound/core/sgbuf.c | 201 ----------------------------------------------- > 4 files changed, 56 insertions(+), 213 deletions(-) > delete mode 100644 sound/core/sgbuf.c > > I've seen further follow ups to this commit, but I still observe this > issue on Linux 6.0.8. > > I have observed this issue on KBL-based system, but I've got reports > also from users of other platforms (including as old as Sandy Bridge). > > I tried to include all relevant information above, but some more details > can be found at original report at > https://github.com/QubesOS/qubes-issues/issues/7465 > > Any ideas? Hm, is it specific to Xen, i.e. if you run the normal kernel on the same machine, does it still work? In anyway, please check the behavior with 6.1-rc8 + the commit cc26516374065a34e10c9a8bf3e940e42cd96e2a ALSA: memalloc: Allocate more contiguous pages for fallback case from for-next of my sound git tree (which will be in 6.2-rc1). If the problem persists, another thing to check is the hack below works. thanks, Takashi -- 8< -- --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1808,9 +1808,16 @@ static int azx_create(struct snd_card *card, struct pci_dev *pci, if (err < 0) return err; +#if 0 /* use the non-cached pages in non-snoop mode */ if (!azx_snoop(chip)) azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV_WC_SG; +#else + if (!azx_snoop(chip)) + azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV_SG; + else + azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV; +#endif if (chip->driver_type == AZX_DRIVER_NVIDIA) { dev_dbg(chip->card->dev, "Enable delay in RIRB handling\n");