On Wed, 22 Sep 2021 00:18:14 +0200, Pierre-Louis Bossart wrote: > > Sorry Takashi for the delay, I missed your reply. > > >>> The position reporting on Intel Skylake and later chips via > >>> azx_get_pos_skl() contains a udelay(20) call for the capture streams. > >>> A call for this alone doesn't sound too harmful. However, as the > >>> pointer PCM ops is one of the hottest path in the PCM operations -- > >>> especially for the timer-scheduled operations like PulseAudio -- such > >>> a delay hogs CPU usage significantly in the total performance. > >>> > >>> The function there was taken from the original code in ASoC SST > >>> Skylake driver blindly. The udelay() is a workaround for the case > >>> where the reported position is behind the period boundary at the > >>> timing triggered from interrupts; applications often expect that the > >>> full data is available for the whole period when returned (and also > >>> that's the definition of the ALSA PCM period). > >> > >> that initial work-around from the Intel SST driver does not seem to be > >> legit in the first place, when I checked with hardware folks recently no > >> one understands why there are delays and special cases for capture. The > >> only requirement wrt. DPIB is that the DDR update is valid across VC0 > >> and VC1, while the DPIB registers are only valid with VC0. For SOF we > >> don't know of any VC1 use so will default to the DPIB vendor-specific > >> registers. > > > > What are those VC0 and VC1 registers? I can't find the definitions in > > the code, so I assume that none of ALSA/ASoC drivers use VC1. > > These are PCI concepts/capabilities. VC stands for "Virtual Channel", > which are mapped to Traffic Class (TC). These registers are typically > set by the BIOS AFAIK. The point is that if VC1 is enabled only the DPIB > updates are valid, the vendor-specific will report information can be > misleading. > > The recommendation from hardware folks is to use DPIB updates in DDR, > which are known to work across both VC0 and VC1. > > For SOF, we do know VC1 is never used so we'll use the vendor-specific > registers. OK, thanks for clarification. > >> See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes > >> for SOF. > >> > >> I don't have the time to look at this specific patch today but wanted to > >> make sure you are aware of my on-going fixes. > >> > >> Note that the use of DPIB works best if you don't use the IOC interrupt. > >> when the interrupt is thrown, there is likely a delay for the DPIB > >> information to be refreshed. > > > > Thanks for the information! The delay could be the reason of the > > udelay(), and that's superfluous as mentioned in the commit. > > > > So the remaining question seems to be which method is a better > > approach for the capture stream: DPIB or posbuf. I kept the posbuf > > just for conservatism, but judging from your comment, we may use DPIB > > for both directions safely? > > sorry you lost me. Isn't DPIB updates in DDR precisely what posbuf reports? > > I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev); > unconditionally, both for capture and playback, and remove the use of > the skylake specific stuff and the delay. When I measured, I saw a slight difference between values in DPIB and posbuf, so I wonder which is actually more accurate. The latter is sometimes a bit behind the former. If the posbuf is more correct in the sense for the PCM pointer, we can simply move back to the posbuf like other platforms. thanks, Takashi > > In anyway, the additional mechanism to check the delayed position > > report in this patch can be kept no matter which way (DPIB or posbuf) > > is used. > > Agree! > > > > > > > Takashi > > > >> > >>> > >>> OTOH, HD-audio (legacy) driver has already some workarounds for the > >>> delayed position reporting due to its relatively large FIFO, such as > >>> the BDL position adjustment and the delayed period-elapsed call in the > >>> work. That said, the udelay() is almost superfluous for HD-audio > >>> driver unlike SST, and we can drop the udelay(). > >>> > >>> Though, the current code doesn't guarantee the full period readiness > >>> as mentioned in the above, but rather it checks the wallclock and > >>> detects the unexpected jump. That's one missing piece, and the drop > >>> of udelay() needs a bit more sanity checks for the delayed handling. > >>> > >>> This patch implements those: the drop of udelay() call in > >>> azx_get_pos_skl() and the more proper check of hwptr in > >>> azx_position_ok(). The latter change is applied only for the case > >>> where the stream is running in the normal mode without > >>> no_period_wakeup flag. When no_period_wakeup is set, it essentially > >>> ignores the period handling and rather concentrates only on the > >>> current position; which implies that we don't need to care about the > >>> period boundary at all. > >>> > >>> Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+") > >>> Reported-by: Jens Axboe <axboe@xxxxxxxxx> > >>> Signed-off-by: Takashi Iwai <tiwai@xxxxxxx> > >>> --- > >>> sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++----- > >>> 1 file changed, 23 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > >>> index 3aa432d814a2..faeeeb923d5e 100644 > >>> --- a/sound/pci/hda/hda_intel.c > >>> +++ b/sound/pci/hda/hda_intel.c > >>> @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev) > >>> * the update-IRQ timing. The IRQ is issued before actually the > >>> * data is processed. So, we need to process it afterwords in a > >>> * workqueue. > >>> + * > >>> + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update > >>> */ > >>> static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev) > >>> { > >>> struct snd_pcm_substream *substream = azx_dev->core.substream; > >>> + struct snd_pcm_runtime *runtime = substream->runtime; > >>> int stream = substream->stream; > >>> u32 wallclk; > >>> unsigned int pos; > >>> + snd_pcm_uframes_t hwptr, target; > >>> > >>> wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk; > >>> if (wallclk < (azx_dev->core.period_wallclk * 2) / 3) > >>> @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev) > >>> /* NG - it's below the first next period boundary */ > >>> return chip->bdl_pos_adj ? 0 : -1; > >>> azx_dev->core.start_wallclk += wallclk; > >>> + > >>> + if (azx_dev->core.no_period_wakeup) > >>> + return 1; /* OK, no need to check period boundary */ > >>> + > >>> + if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt) > >>> + return 1; /* OK, already in hwptr updating process */ > >>> + > >>> + /* check whether the period gets really elapsed */ > >>> + pos = bytes_to_frames(runtime, pos); > >>> + hwptr = runtime->hw_ptr_base + pos; > >>> + if (hwptr < runtime->status->hw_ptr) > >>> + hwptr += runtime->buffer_size; > >>> + target = runtime->hw_ptr_interrupt + runtime->period_size; > >>> + if (hwptr < target) { > >>> + /* too early wakeup, process it later */ > >>> + return chip->bdl_pos_adj ? 0 : -1; > >>> + } > >>> + > >>> return 1; /* OK, it's fine */ > >>> } > >>> > >>> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) > >>> if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK) > >>> return azx_skl_get_dpib_pos(chip, azx_dev); > >>> > >>> - /* For capture, we need to read posbuf, but it requires a delay > >>> - * for the possible boundary overlap; the read of DPIB fetches the > >>> - * actual posbuf > >>> - */ > >>> - udelay(20); > >>> + /* read of DPIB fetches the actual posbuf */ > >>> azx_skl_get_dpib_pos(chip, azx_dev); > >>> return azx_get_pos_posbuf(chip, azx_dev); > >>> } > >>> > >> >