Hi Mathias, On Sat, Jun 30, 2018 at 10:07:04PM +0100, Sudip Mukherjee wrote: > Hi Mathias, > > On Fri, Jun 29, 2018 at 02:41:13PM +0300, Mathias Nyman wrote: > > On 27.06.2018 14:59, Sudip Mukherjee wrote: > > > > > Can you share a bit more details on the platform you are using, and what types of test you are running. > > > > <snip> > Then to track what is going on, I added the slub debugging and :( > I have attached part of dmesg for you to check. > Will appreciate your help in finding out the problem. I did some more debugging. Tested with a KASAN enabled kernel and that shows the problem. The report is attached. To my understanding: btusb_work() is calling usb_set_interface() with alternate = 0. which again calls usb_hcd_alloc_bandwidth() and that frees the rings by xhci_free_endpoint_ring(). But then usb_set_interface() continues and calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()-> xhci_urb_dequeue() which at the end gives the command to stop endpoint. In all the cycles I have tested I see that only in the fail case handle_cmd_completion() gets called, but in the cycles where the error is not there handle_cmd_completion() is not called with that command. I am not sure what is happening, and you are the best person to understand what is happening. :) But for now (untill you are back from holiday and suggest a proper solution), I made a hacky patch (attached) which is working and I donot get any corruption after that. Both KASAN and slub debug are also happy. So, now waiting for you to analyze what is going on and suggest a proper fix. Thanks in advance. -- Regards Sudip
[ 236.814156] ================================================================== [ 236.814187] BUG: KASAN: use-after-free in xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd] [ 236.814193] Read of size 8 at addr ffff8800789329c8 by task weston/138 [ 236.814203] CPU: 0 PID: 138 Comm: weston Tainted: G U W O 4.14.47-20180606+ #7 [ 236.814206] Hardware name: xxx, BIOS 2017.01-00087-g43e04de 08/30/2017 [ 236.814209] Call Trace: [ 236.814214] <IRQ> [ 236.814226] dump_stack+0x46/0x59 [ 236.814238] print_address_description+0x6b/0x23b [ 236.814255] ? xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd] [ 236.814262] kasan_report+0x220/0x246 [ 236.814278] xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd] [ 236.814294] trb_in_td+0x3b/0x1cd [xhci_hcd] [ 236.814311] handle_cmd_completion+0x1181/0x2c9b [xhci_hcd] [ 236.814329] ? xhci_queue_new_dequeue_state+0x5d9/0x5d9 [xhci_hcd] [ 236.814337] ? drm_handle_vblank+0x4ec/0x590 [ 236.814352] xhci_irq+0x529/0x3294 [xhci_hcd] [ 236.814362] ? __accumulate_pelt_segments+0x24/0x33 [ 236.814378] ? finish_td.isra.40+0x223/0x223 [xhci_hcd] [ 236.814384] ? __accumulate_pelt_segments+0x24/0x33 [ 236.814390] ? __accumulate_pelt_segments+0x24/0x33 [ 236.814405] ? xhci_irq+0x3294/0x3294 [xhci_hcd] [ 236.814412] __handle_irq_event_percpu+0x149/0x3db [ 236.814421] handle_irq_event_percpu+0x65/0x109 [ 236.814428] ? __handle_irq_event_percpu+0x3db/0x3db [ 236.814436] ? ttwu_do_wakeup.isra.18+0x3a2/0x3ce [ 236.814442] handle_irq_event+0xa8/0x10a [ 236.814449] handle_edge_irq+0x4b2/0x538 [ 236.814458] handle_irq+0x3e/0x45 [ 236.814465] do_IRQ+0x5c/0x126 [ 236.814474] common_interrupt+0x7a/0x7a [ 236.814478] </IRQ> [ 236.814483] RIP: 0023:0xf79d3d82 [ 236.814486] RSP: 002b:00000000ffc588e8 EFLAGS: 00200282 ORIG_RAX: ffffffffffffffdc [ 236.814493] RAX: 0000000000000000 RBX: 00000000f7bebd5c RCX: 0000000000000000 [ 236.814496] RDX: 0000000008d4197c RSI: 0000000000000000 RDI: 00000000f746c020 [ 236.814499] RBP: 00000000ffc588e8 R08: 0000000000000000 R09: 0000000000000000 [ 236.814503] R10: 0000000000000000 R11: 0000000000200206 R12: 0000000000000000 [ 236.814506] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 236.814513] Allocated by task 2082: [ 236.814521] kasan_kmalloc.part.1+0x51/0xc7 [ 236.814526] kmem_cache_alloc_trace+0x178/0x187 [ 236.814540] xhci_segment_alloc.isra.11+0x9d/0x3bf [xhci_hcd] [ 236.814553] xhci_alloc_segments_for_ring+0x9e/0x176 [xhci_hcd] [ 236.814566] xhci_ring_alloc.constprop.16+0x197/0x4ba [xhci_hcd] [ 236.814579] xhci_endpoint_init+0x77a/0x9ba [xhci_hcd] [ 236.814592] xhci_add_endpoint+0x3bc/0x43b [xhci_hcd] [ 236.814615] usb_hcd_alloc_bandwidth+0x7ef/0x857 [usbcore] [ 236.814637] usb_set_interface+0x294/0x681 [usbcore] [ 236.814645] btusb_work+0x2e6/0x981 [btusb] [ 236.814651] process_one_work+0x579/0x9e9 [ 236.814656] worker_thread+0x68f/0x804 [ 236.814662] kthread+0x31c/0x32b [ 236.814668] ret_from_fork+0x35/0x40 [ 236.814672] Freed by task 1533: [ 236.814678] kasan_slab_free+0xb3/0x15e [ 236.814683] kfree+0x103/0x1a9 [ 236.814696] xhci_ring_free+0x205/0x286 [xhci_hcd] [ 236.814709] xhci_free_endpoint_ring+0x4d/0x83 [xhci_hcd] [ 236.814722] xhci_check_bandwidth+0x57b/0x65a [xhci_hcd] [ 236.814743] usb_hcd_alloc_bandwidth+0x665/0x857 [usbcore] [ 236.814765] usb_set_interface+0x294/0x681 [usbcore] [ 236.814772] btusb_work+0x664/0x981 [btusb] [ 236.814777] process_one_work+0x579/0x9e9 [ 236.814782] worker_thread+0x68f/0x804 [ 236.814788] kthread+0x31c/0x32b [ 236.814793] ret_from_fork+0x35/0x40 [ 236.814799] The buggy address belongs to the object at ffff8800789329c8 which belongs to the cache kmalloc-64 of size 64 [ 236.814804] The buggy address is located 0 bytes inside of 64-byte region [ffff8800789329c8, ffff880078932a08) [ 236.814806] The buggy address belongs to the page: [ 236.814812] page:ffffea0001e24c80 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 [ 236.825813] flags: 0x4000000000008100(slab|head) [ 236.830981] raw: 4000000000008100 0000000000000000 0000000000000000 0000000100130013 [ 236.830988] raw: ffffea0000cfbaa0 ffffea00010ddf20 ffff88013b80f640 0000000000000000 [ 236.830990] page dumped because: kasan: bad access detected [ 236.830993] Memory state around the buggy address: [ 236.830999] ffff880078932880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 236.831004] ffff880078932900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 236.831008] >ffff880078932980: fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb [ 236.831011] ^ [ 236.831015] ffff880078932a00: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 236.831019] ffff880078932a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 236.831021] ================================================================== [ 236.831024] Disabling lock debugging due to kernel taint
>From cbbe6dc59ac90a4f2c358de56e58e254320171e0 Mon Sep 17 00:00:00 2001 From: Sudip Mukherjee <sudipm.mukherjee@xxxxxxxxx> Date: Tue, 10 Jul 2018 09:50:00 +0100 Subject: [PATCH] hacky solution to mem-corruption Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@xxxxxxxxx> --- drivers/usb/core/message.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c index 7cd4ec33dbf4..7fdf7a27611d 100644 --- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -1398,7 +1398,8 @@ int usb_set_interface(struct usb_device *dev, int interface, int alternate) remove_intf_ep_devs(iface); usb_remove_sysfs_intf_files(iface); } - usb_disable_interface(dev, iface, true); + if (!(iface->cur_altsetting && alt)) + usb_disable_interface(dev, iface, true); iface->cur_altsetting = alt; -- 2.11.0