On Tue, 11 May 2010, Takashi Iwai wrote: > At Tue, 11 May 2010 11:07:05 +0200 (CEST), > Jaroslav Kysela wrote: >> >> On Tue, 11 May 2010, Takashi Iwai wrote: >> >>> At Tue, 11 May 2010 10:44:17 +0200 (CEST), >>> Jaroslav Kysela wrote: >>>> >>>> On Tue, 11 May 2010, Takashi Iwai wrote: >>>> >>>>> At Tue, 11 May 2010 10:28:50 +0200 (CEST), >>>>> Jaroslav Kysela wrote: >>>>>> >>>>>> On Mon, 10 May 2010, Jaroslav Kysela wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I would like to ask HDA gurus to check this patch (I will include it >>>>>>> to my tree once acked). The patch uses WALLCLK from HDA chips (marked as >>>>>>> required in the HDA specification) to check for bogus - too early - interrups >>>>>>> which confuses the upper PCM layer (sound skipping issues). >>>>>>> >>>>>>> More details about patch testing on problematic hardware can be found >>>>>>> at: >>>>>>> >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15912 >>>>>> >>>>>> No objections received, so I pushed this code to my master/devel branches. >>>>> >>>>> Heh, that was quick, I had no time to review :) >>>>> >>>>> The change looks good to me. Thanks for your fix (and rebasing). >>>>> >>>>> One thing is that you removed bdl_pos_adj=0 check. This was there >>>>> explicitly to avoid the delayed irq handling no matter whether the >>>>> value is correct or not. I guess it's safer to take it back, or >>>>> give any other way for user to avoid the delayed irq handling. >>>> >>>> I see. But I need the pos % period_bytes check too. What about this >>>> change? >>> >>> With bdl_pos_adj==0, it should return 1 (non-error), but otherwise >>> this should be OK, I guess. >> >> The problem with returning value 1 -> call snd_pcm_elapsed() - is that the >> upper layer will be most probably confused on broken hw. From logs in >> kernel bz#15912, the bad irq can be triggered at any time (I really don't >> know why INTSTS register is set). >> >> The first wallclk check uses to check for first 2/3 of period and the >> pos % period_bytes check checks for second half of period. Writing this, I >> think that the extra check can be added to the second condition to do >> position related check for first next period only: > > Isn't a similar check done in PCM core side? > Filtering or delaying ack is perhaps better done here, though... Unfortunately, the check in PCM core expects that at least 'period_time' is elapsed between two snd_pcm_elapsed() calls: if (in_interrupt) { /* we know that one period was processed */ /* delta = "expected next hw_ptr" for in_interrupt != 0 */ delta = runtime->hw_ptr_interrupt + runtime->period_size; In situation, when the driver calls the snd_pcm_elapsed() too early, the pointers will be updated wrongly (whole ring buffer size will be added to hw_ptr). >> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c >> index 0a6c55b..d4d532a 100644 >> --- a/sound/pci/hda/hda_intel.c >> +++ b/sound/pci/hda/hda_intel.c >> @@ -1890,9 +1890,8 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev) >> unsigned int pos; >> int stream; >> >> - wallclk = azx_readl(chip, WALLCLK); >> - if ((wallclk - azx_dev->start_wallclk) < >> - (azx_dev->period_wallclk * 2) / 3) >> + wallclk = azx_readl(chip, WALLCLK) - azx_dev->start_wallclk; >> + if (wallclk < (azx_dev->period_wallclk * 2) / 3) >> return -1; /* bogus (too early) interrupt */ >> >> stream = azx_dev->substream->stream; >> @@ -1910,9 +1909,12 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev) >> >> if (WARN_ONCE(!azx_dev->period_bytes, >> "hda-intel: zero azx_dev->period_bytes")) >> - return 0; /* this shouldn't happen! */ >> - if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2) >> - return 0; /* NG - it's below the period boundary */ >> + /* this shouldn't happen! */ >> + return bdl_pos_adj[chip->dev_index] ? 0 : -1; >> + if (wallclk <= azx_dev->period_wallclk && >> + pos % azx_dev->period_bytes > azx_dev->period_bytes / 2) >> + /* NG - it's below the first next period boundary */ >> + return bdl_pos_adj[chip->dev_index] ? 0 : -1; >> azx_dev->start_wallclk = wallclk; >> return 1; /* OK, it's fine */ >> } >> >> I don't see any reason to call snd_pcm_elapsed() in bad time. It's better >> to return -1 (ignore) and wait for next interrupt. > > Hm, judging from the current situation, a missing IRQ is harmless than > the wrong position ack, indeed. > > But delaying the period handling is a bit different from filtering > bogus IRQs. I'm not entirely sure which is the cause in the bug > above. If it's a bogus IRQ, could it be filtered out. If it's an IRQ > at wrong time (too early), the delayed ack is needed. If it's the > wrong position read, the delayed ack isn't needed but the position > correction is needed. > > So, I'm a bit afraid of adding too much checks that might give a > reaction based on wrong information. We'd need to take a detailed > look again at each cause... I'm also not so much happy to have this code in the driver, but it seems that HDA hardware does not respect the standard behaviour in some cases. Anyway, it's good to have the code robust. > BTW, now I looked back again, and think > WARN_ONCE(!azx_dev->period_bytes) should be also returning -1 for > normal cases, no? This is just a sanity check, and if this happens, > the chance that you get the right IRQ at the next moment isn't high, > since something got screwed up. I agree. I put all suggested changes to my tree into "[ALSA] snd-hda-intel: Improve azx_position_ok()" commit. Jaroslav ----- Jaroslav Kysela <perex@xxxxxxxx> Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc. _______________________________________________ Alsa-devel mailing list Alsa-devel@xxxxxxxxxxxxxxxx http://mailman.alsa-project.org/mailman/listinfo/alsa-devel