Re: [LAD] Not all XHCI (USB-Controllers) are made equal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ASRock boards like my Z790 LiveMixer,
have 2x low latency USB ports,
Yellow  / Lightning USB ports
https://www.asrock.com/microsite/2021embracethefuture/single-post3.html
https://www.asrock.com/mb/Intel/Z790%20LiveMixer/Specification.asp
https://www.asrock.com/microsite/2022EmbraceTheFuture/single-post8.html
but...
USB Bluetooth 5.0 dongle does Not work in some ports... 
Bus 005 Device 003: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)

old 2010 server boards like Tyan S8232, allow to set USB Sync on the Bios,
Focurite USB interface, has serious performance problems with USB Sync Activated...
Sync limit performance.
Async means both device clocks are different, but information is transferred at max speed allowed by link.

All USB interfaces,
All GPU´s are Async:
https://www.youtube.com/watch?v=m9qL7gfNxxs

SCSI Scanners are Async... >5MB/s
SCSI HDDs are Sync.

USB interfaces have AD/DA clock: 44.1 or 48khz x 256 = 12 & 14MHz .
some have x512 or x1024
some have DSP or FPGA clock, MIDI clock, and USB clock.
sometimes DSP or FPGA clock is same as AD/DA clock, sometimes Not.

different AD/DA clocks have different phase noise characteristics,
affecting sound like different Dither Noise shaping algorithms.

#2.
Lowest latency possible are DSP sound interfaces, 
Protools HD pci-x / pcie or HDX pcie + converter AD/DA latency.
maybe Avid Carbon, but its ethernet AVB,

Avid HD io AD/DA has 3 miliseconds, but goes up to 6 ms, depending on the plugins inserted: EQ, compressor, etc....
HDX dsp latency increase / change with plugins in/out in-serial, 
aprox. <1ms per DSP plugin, some have more, because
some algorithms try to "see the future" and adapt, like look-ahead true peak limiters,
others use oversampling, etc...

HDX dsp has 4-samples of latency No plugins, No AD/DA, because its a complex dsp + fpga.
HD io digilink interface also has Altera FPGA in each AD/DA board + Chassis controller board.

Lynx AES16 & RME hdsp 9632 have basic DSP,  2 samples of latency.
Focusrite USB interfaces like Scarlett & Clarett mk2 also have basic DSP,

CPU latency is fixed, & defined by buffer size / driver.
The problem with CPU processing are several,
Milllions of Interrupts,
Branch Prediction algorithms,
Speculative execution algorithms,
prediction & Play pause will never sound the same.
some CPU algorithms are very close, designed to minimize C++ prediction, likely() and other methods,
like pspaudioware MasterQ v1.02 has 64-Bit FP DP,
Newer MasterQ 2, has 80-Bit, sounds different, dont like it.
but testing same plugin/algorithm:
DSP vs. CPU
DSP always nicer.
like Avid Focusrite d2/d3 or older digidesign focusrite forte suite TDM.

New CPUs have better HPET and Better Prediction, but still can never be 100%.


older 48Khz AD/DA had a lot of latency,
1st generation 192Khz also had a lot of latency (2004-2005), 

today, most usb interfaces like Focusrite Scarlett or Clarett mk2 have near latest AKM & Cirrus, 
with near 0-latency AD/DA + DSP + Drivers.

Focusreite 18i20mk2 outs 1 & 2 are Cirrus, 3-10 are AKM.
https://github.com/geoffreybennett/alsa-scarlett-gui

AKM has a bit less latency vs. Cirrus has a bit more, 
because Cirrus uses a fast voltage processor circuit to emulate smooth "analog" waveforms
after the Sample & Hold circuit and brickwall filter, transition filter: 
https://src.infinitewave.ca/

similar to Arp 2600 LAG Voltage Processor, but much faster, calculated for each sample rate,
https://www.manualslib.com/manual/1211473/Arp-2600.html?page=46#manual
 other brands like pioneer dj use other DACs,

other is: RME FireFace 800,
early FF800s had 1st gen 192Khz DA AK4395, 
since march 2005, had AK4396 ,
2nd generation 192Khz ic has lower latency.
https://web.archive.org/web/20091229053519/http://www.rme-audio.de/download/fface800_e.pdf
page.96

DA latency:
Sample frequency kHz 
44.1 | 48 | 88.2 | 96 | 176.4 | 192 

DA (43.5 x 1/fs) ms * AK4395
0.99 | 0.9 | 0.49 | 0.45 | 0.25 | 0.23 

DA (28 x 1/fs) ms * AK4396 
0.63 | 0.58 | 0.32 | 0.29 | 0.16 | 0.15 

-----------------------

Apogee Rosetta 800 originally was 96khz, 
after 2004-2005, some were upgraded to 192khz 1st gen AD/DA.
https://www.soundonsound.com/reviews/apogee-rosetta-800

old AD/DA + 512 buffer PCI/PCIe = same latency 
vs. New AD/DA + 1024 buffer USB 2,
256 = 512
128 = 256
64 = 128
32 = 64
old vs. new AD/DA

Latency lowers when working at 2x or 4x sample rate: 96Khz / 192Khz,
but working at 192Khz is Not good,
because requires a clock with 1pico second jitter = very expensive, most interfaces don´t have.
https://en.wikipedia.org/wiki/Analog-to-digital_converter#Jitter

Agilent Keysight TrueForm Generators have 1 pico second of jitter.
https://www.youtube.com/watch?v=HLPoSiorh30&t=126s
https://www.youtube.com/watch?v=1hxN3QPL4E4&t=52s

most USB interfaces have JetPLL clock, its ok, but...
Not as good as a MasterClock like Grimm Audio CC1,
the difference is small in small speakers, and Big in large systems.

true dual Quartz XO with ultra low phase noise circuit design, 
PLL instead of Dual XO started around E-Mu Ultra 6000 samplers in 1999
https://www.vintagesynth.com/emu/emulator4

when using external clock, there is another problems:
the signal is re.clocked again by a PLL included in the decoder IC,
some PLL are very strict, some are Not...
can be adjusted replacing external feedback circuit resistor and capacitor, RC.
strict PLL gives priority to internal clock,
all inputs have different PLL low pass filters, 
the best is s/pdif, because the voltage is very low 0.5v, pll filter is not aggressive.
WordClock, AES/EBU inputs, all have more aggressive PLL.

the cable requires very good shield, litz wire, good dielectrics, etc...

RME hdsp 9632 / FireFace800 have aggressive DDS, but "turned off." by default,
Lynx AES16 same... has SynchroLock, super long slow PLL.
most interfaces Pll are fixed.

Avid Carbon has 2xFS JetPLL = twice frequency JetPLL, promises lower jitter.
similar to Steinberg SSPLL in AXR interfaces,

#4.
Linux Liquorix Kernel / LowLatency kernel or Windows 8.1 kernel,
allow <8 millisecond latency at 96Khz
in USB
measured with Oscilloscope.
https://github.com/falkTX/Carla/issues/1912

Linux ALSA / Jack allows to change: 
Frames & Period independent...
means:
256 x2 = 128 x4
but... 256x2 works with slower CPU´s
128x4 requires a faster CPU to have the same latency = pointless.

when CPU audio plugins are used at lowest latency possible,
32 buffer, CPU does less interrupts, and the CPU sound is more similar to DSP sound,
when 1x plugin eats 100% CPU load.

#5. Rensesas has different USB 3 ICs
200 series included in some PCIe USB cards, and old Asus Rampage 3 Extreme lga1176 board,
and 201 / 202 series
Renesas/Nec uPD720202 latest Firmware breaks compatibility with Mac OSX Maverics,
https://www.station-drivers.com/index.php/en/component/remository/Drivers/Renesas-Nec/USB-3.0/lang,en-gb/

there is also VIA VL800, has weird USB drivers for windows, "latest" does Not work, but previous works ok in Wni8.1.

_________________________________
From: Florian Paul Schmidt <mista.tapas@xxxxxxx>
Sent: Thursday, January 16, 2025 4:54 AM
To: linux-audio-dev@xxxxxxxxxxxxxxxxxxxx; linux-audio-user@xxxxxxxxxxxxxxxxxxxx
Subject: [LAD] Not all XHCI (USB-Controllers) are made equal

Hi!

This mail is just a heads-up about a finding discovered with the help of
the linux-usb mailing list and which I thought some other people might
benefit from. If this is nothing new to you, then please do ignore it.

Background:

The USB audio class 2.0 specification dictates that isochronous
transfers (i.e. audio frames to/from an audio interface) happen every
"micro-frame". USB micro-frames are 125 microseconds (us) apart. 125 us
= 0.125 milliseconds (ms).

The majority of USB audio interfaces (at least those that I have) use
synchronous audio-streaming, i.e. the sample clock is derived from the
bus clock. There are definitely interfaces that use e.g. adaptive or
asynchronous modes and this discussion would have to be altered for these.

Given the case of synchronous mode isochronous transfers at a sampling
rate of 48000 Hz (= 48 kHz) this would correspond to 6 audio frames per
USB micro-frame.

48000 frames/second * 0.000125 seconds = 6 frames

So in principle a very well behaved audio interface attached to a very
well behaved USB controller sitting in a well tuned system _should_ be
able to achieve a minimum round-trip latency of 2 * 6 frames = 12 frames
or 250 us. This leaves out additional buffering inside the audio
interface and additional latency by anti-aliasing and reconstruction
filters.

The first caveat to above in the context of Linux: The snd_usb_audio
driver does not seem to support period sizes of just 6 frames. It _does
support 12 frames though, which is nice.

And here is the other caveat which lead to the title of this mail: Some
controllers are behaving "worse" than others. There's this little thing
in the XHCI spec which amounts to the following: The XHCI can specify in
a register how many micro-frames have to buffered at all times for
outgoing isochronous endpoints. The Intel XHCI in my ASRock N100dc-itx
main-board for example request a whole USB frame (which corresponds to 8
micro-frames or 1 ms) to be buffered at all times. Another XHCI that I
have (a Renesas controller) only requires one micro-frame. This has
direct consequences on the kind of period sizes and number of periods
that are usable on these controllers. These consequences don't explain
everything but at least you know that you can't ever get better than
this limit.

For example for a period size of 48 frames I need to use 3 periods on
the Intel controller (resulting in 3 ms round-trip latency), but for the
Renesas I can use 2 periods at 48 frames (resulting in 2 ms round-trip
latency). On the Renesas controller even 2 periods at 24 frames works
fine (1 ms round-trip latency).

I can lower the latency on the Intel controller by using a smaller
period size but more of them as long as the buffering requirement of the
controller is satisfied. One stable setting is for example a period size
of 24 and 5 periods which results in a round-trip latency of 2.5 ms.

So what's the take-away here? In some cases, if you are chasing stable
low-latency operation using a USB audio class 2.0 device it might just
be worth installing a different XHCI in your computer (the above
mentioned Renesas controller is just a PCI-Express card which sits in a
slot in my N100DC-itx board) that is better behaved than the one you
currently have.

Another take-away is that the above limitation only applies to the
outgoing direction (playback). If all I was interested in would be the
capture direction then 2 periods at 48 frames would work fine even on
the Intel controller.

Kind regards,
FPS
_______________________________________________
Linux-audio-dev mailing list -- linux-audio-dev@xxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to linux-audio-dev-leave@xxxxxxxxxxxxxxxxxxxx
_______________________________________________
Linux-audio-user mailing list -- linux-audio-user@xxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to linux-audio-user-leave@xxxxxxxxxxxxxxxxxxxx




[Index of Archives]     [Linux Sound]     [ALSA Users]     [Pulse Audio]     [ALSA Devel]     [Sox Users]     [Linux Media]     [Kernel]     [Photo Sharing]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux