Re: [Bug 217242] CPU hard lockup related to xhci/dma

Hans Petter Selasky <hps@xxxxxxxxxxx> · Wed, 5 Apr 2023 20:15:25 +0200

On 4/2/23 20:57, Alan Stern wrote:
[Bugzilla removed from the CC: list, since this isn't relevant to the bug
report]

On Sun, Apr 02, 2023 at 07:25:27PM +0200, Greg KH wrote:
On Sun, Apr 02, 2023 at 05:54:18PM +0200, Hans Petter Selasky wrote:
While that being said, I wish the Linux USB core would take the example of
the FreeBSD USB core, and pre-allocate all memory needed for USB transfers,
also called URB's, during device attach.

Many drivers do that today already, which specific ones do you think
need to have this added that are not doing so?

Hans is undoubtedly referring to the host controller drivers.

Hi Alan,

Yes, I'm on the USB host side this time.

usb_alloc_urb() allocates memory for the URB itself.  But the routine does
not know which device or host controller the URB will eventually be used
with, so it doesn't know which HCD to tell to set aside adequate memory
for handling the URB once it is submitted.  And since HCDs tend to process
URB submissions while holding a private spinlock, when their memory
allocation does get done it cannot use GFP_KERNEL.

I remember a long time ago when memory allocation was very slow in 
FreeBSD, testing the USB control endpoint was difficult, without at the 
same time using 100% CPU. The reason was user-space applications used 
IOCTL's to do USB control endpoint requests synchronously, and that 
leaded to the request data being alloc'ed and free'd regularly. That was 
before jemalloc and per-CPU slabs. It was not the amount of data causing 
problems, but the request rate, 1000 - 8000 requests per second 
typically. Finding free holes in memory bitmaps due to fragmentation is 
_very_ expensive!

I think it's fair to call this a weak point in Linux's USB stack.
Balancing this, it should be pointed out that we can't always know in
advance how large an URB's transfer buffer will be, and the amount of
memory that the HCD will need can depend on this size.
>

In FreeBSD you have to specify a maximum length in bytes per "urb" or 
FreeBSD USB transfer, and various other static properties. Then you 
don't allocate and free those URB's so to speak, but just keep on 
re-using them, after first time allocation. All XHCI DMA structures are 
then just pre-allocated, because we know the PAGE_SIZE and how stuff is 
laid out into memory, it's easy to compute exactly the worst and best 
case for the number for hardware structures you need.

This is also very useful for boot-loaders, that FreeBSD USB can either 
run all single threaded with few fixed size memory pools, or multi 
threaded as part of a bigger OS.

Frequently going through allocate
and free cycles during operation, is not just inefficient, but also greatly

In fact, the original Slab memory allocator (in Solaris 2.4) was designed
to make frequent allocate-and-free cycles extremely efficient.  So much so
that people would just naturally do things that way instead of
pre-allocating memory which would then just sit around unused a large
fraction of the time.

I suspect the allocators in the Linux kernel don't end up being quite as
efficient as the original Slab, however.

FreeBSD USB is a completely different design compared to Linux. Anyway, 
back to the topic and thanks for the chat :-)

--HPS