Re: KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)

Maik Stohn <maik.stohn@xxxxxxxxxxxx> · Wed, 24 Jul 2019 16:34:06 +0200



> 
> Am 24.07.2019 um 16:20 schrieb Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>:
> 
> On Wed, Jul 24, 2019 at 03:27:51PM +0200, Maik Stohn wrote:
>> KERNEL CRASH when using XHCI devices (affects any architecture, any USB device) 
>> 
>> This was already reported as a kernel bug in bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report it here since it is usb related...
>> 
>> Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...
>> 
>> This bug is already causing real world problems with existing software and devices using SCSI BOT with raw SCSI commands and libusb software.
>> 
>> Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):
>> 
>> - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive attached to USB3.0 port)
>> - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the flash drive)
>> - command line tool "sg_raw" from "sg3-utils"
>> - execute: and press a key + return (-s1 sends one byte which is read from stdin)
>>   $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
>> 
>> -> KERNEL Oops
>> 
>> - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the maximum of bytes on my 64 bit machine where the "DMA bypass optimization / IDT" kicks in, see below)
>> 
>> Since this can be triggered by any normal user (without any special USB device needed) I think it is important enough to fix it for the existing 5.2 kernel as well.
>> 
>> ---
>> 
>> Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate Data Transfer support"
>> 
>> Reason: NULL pointer dereference
>> 
>> ---
>> 
>> I took me quite some time to find the cause of this.
>> 
>> I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in "xhci-ring.c" where the next SG is loaded
>> 
>> ...
>>    while (sg && sent_len >= block_len) {
>>      /* New sg entry */
>>      --num_sgs;
>>      sent_len -= block_len;
>>      if (num_sgs != 0) {
>>        sg = sg_next(sg);
>>        block_len = sg_dma_len(sg);           <================= CRASH
>>                                                                 The comment of "sg_dma_len" clearly states "These macros should be used after a dma_map_sg call has been done..." - which was 
>>                                                                 omitted by the new "xhci_map_urb_for_dma" function since the transfer was considered suitable for IDT.
>>        addr = (u64) sg_dma_address(sg);
>>        addr += sent_len;
>>      }
>>    }
>>    block_len -= sent_len;
>>    send_addr = addr;
>> ...
>> 
>> This only happens if the transfer was cosnideres suitable for IDT. 
>> When I patched the function "xhci_urb_suitable_for_idt" to always return false (nothing suitable for IDT) everything was working fine. 
>> 
>> 
>> Unfortunately I'm not deep enough into the inner workings of the kernel usb host driver to find a solution for this other than reverting the patch for IDT.
> 
> What patch did you find that caused this regression?  We can revert it
> if that is the easiest thing to do.
> 
> thanks,
> 
> greg k-h

I included the patch causing it above: https://patchwork.kernel.org/patch/10919167/  


Greetings, 

Maik Stohn

Attachment:
smime.p7s

Description: S/MIME cryptographic signature