Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller

FPS <mista.tapas@xxxxxxx> · Tue, 20 Aug 2024 23:04:24 +0200

On 8/20/24 13:01, Michał Pecio wrote:
I can offer a few quick suggestions:

1. When kernel bugs are suspected, try other kernels offered by your
distribution. See if there is any chance that -rt paches are causing
issues.

I have tried other kernels, including 4.19.319, 5.15.163-rt78 and
6.6.45. They all show the same behaviour of the audio device not
operating correctly at all with settings like 48 frames, 2 periods.

I have another Alder Lake system with an N100 CPU which has a similar
xHCI and it also shows problem 1 (see my answer to your question 2
below) but not problem 2 (see below as well):

00:14.0 USB controller: Intel Corporation Alder Lake-N PCH USB 3.2 xHCI
Host Controller (prog-if 30 [XHCI])
	Subsystem: ASRock Incorporation Device 54ed
	Flags: bus master, medium devsel, latency 0, IRQ 126
	Memory at 6001100000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
	Capabilities: [90] Vendor Specific Information: Len=14 <?>
	Capabilities: [b0] Vendor Specific Information: Len=00 <?>
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci

I guess it's the same chip but integrated somewhat differently.

In that system I also have a PCIE usb controller:

01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host
Controller (rev 03) (prog-if 30 [XHCI])
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at 80a00000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [50] Power Management version 3
	Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
	Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
	Capabilities: [a0] Express Endpoint, IntMsgNum 0
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Latency Tolerance Reporting
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci

And that one just works perfectly with full duplex operation even down
to buffer sizes of 12 frames / 2 buffers:

$ jackd -R -P 70 -d alsa -d hw:USB -n 2 -p 12

$ ./jack_wakeup -n 2000
min: 0.236729 ms; mean: 0.249978ms;  max: 0.258432 ms

0.25 ms would be perfect (2 USB microframes.)

So the -rt kernel is not to fault here per se.

(see note below on jack_wakeup)

2. Does any of that go away when ALSA buffer size is increased or is it
always there on this machine?

Very good question! I suppose my initial report was a little unclear on
this. On this specific system there are two problems:

1. Certain combinations of buffer size and number of buffers do not work
reliably at all (i.e. xruns). These are basically all combinations of
buffer sizes < 128 and 2 buffers. I tested buffer sizes of 64 frames, 48
frames and 24 frames.

2. For those combinations that do seem to work (for example buffer size
64 with 3 buffers or buffer size of 128 and 2 buffers and up from there)
there are sporadic (about every 2-10 seconds or so) extra delays of
about 2 - 4 ms and they seem to be not randomly distributed at all but
rather always pretty close to full milliseconds.

Buffer sizes above 4 ms seem to work reliably with the light load I
tested but that is expected since the extra delay is then just masked by
the large buffers.

About problem 2: On the 4.19.319 kernel I tried problem 2 went away. It
also has not resurfaced after I rebooted into 6.6.44-rt39. I will do a
full power cycle and see if it resurfaces.

3. When posting wall of text errors, start at the beginning because it
may offer clues about what originally went wrong ('dmesg -W' helps).

Sure!

4. Playing a tiny file with 'aplay --period-size=48 --buffer-size=96'
is a simpler way to reproduce the problem and generates a shorter log.

Good point! But I played around some more and it seems that the problem
actually manifests in this precise way only if I actually do full duplex
audio processing. aplay would just use the playback direction, and also
it does not really do correct realtime scheduling.

If I just use the capture direction buffer sizes like 24 or 48 with 2
buffers appear to work and give me expected jitter. E.g for 48:

$ jackd -R -P 70 -d alsa -d hw:USB -n 2 -p 48 -C

$ ./src/jack_wakeup/jack_wakeup
min: 0.993217 ms; mean: 1.00002ms;  max: 1.00574 ms

Or for buffer size 24:

$ jackd -R -P 70 -d alsa -d hw:USB -n 2 -p 24 -C

$ ./src/jack_wakeup/jack_wakeup
min: 0.488788 ms; mean: 0.499921ms;  max: 0.510261 ms

For buffer size 12 things break though.

Note: jack_wakeup is just a small utility I wrote that measures the
interval between consecutive process callbacks which, in an unloaded
jack graph is useful to measure jitter in the system.

If you do not like the complexity that jack (jack1 in this case)
introduces I can probably cook up a small C program that just sets up
SCHED_FIFO, mlockall, etc, and does either simplex or full duplex
operations on the audio interface.

I have to note that these "WARN Event TRB for slot 18 ep 1 with no TDs
queued?" were there before enabling this dynamic debug feature, I just
forgot to mention them in my original mail.

This particular part is probably caused by our failure to properly
handle the preceding condition ("underrun event still with TDs queued").
I can't know for sure, but assuming no hardware bugs, it appears that a
new transfer descriptor is queued after the hardware reports a ring
underrun but before we actually process the report. While processing
the underrun we are surprised by this unexpected TD, then we see that
skip flag is set so we erronously report all TDs (most likely including
the new one) as failed to the audio driver. Meanwhile the hardware may
execute this transfer and report its completion later, at which point
we have already forgotten about it.

*Maybe* this creates enough chaos that some sort of infinite loop of
cascading errors is established as a result of one recoverable error.
Or maybe your problem is elsewhere and this bug is only a side effect.

Are you able to test kernel patches?

Yes, of course. It's been a couple of years since I did the whole
menuconfig, patch, rebuild, reboot, test - dance. I did this quite often
when Ingo posted early versions of the realtime preemption patches. E.g.

https://lkml.org/lkml/2004/11/21/184

Kind regards and thanks for your suggestions,
FPS