On Thu, Apr 11, 2024 at 11:24 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Apr 11, 2024 at 02:52:27PM +0800, Sam Sun wrote: > > Dear developers and maintainers, > > > > We encountered a general protection fault in function disable_store. > > It is tested against the latest upstream linux (tag 6.9-rc3). C repro > > and kernel config are attached to this email. Kernel crash log is > > listed below. > > ``` > > general protection fault, probably for non-canonical address > > 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN > > KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] > > CPU: 1 PID: 9459 Comm: syz-executor414 Not tainted 6.7.0-rc7 #2 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > > RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88 > > Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa > > ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > 02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48 > > RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246 > > RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8 > > RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005 > > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85 > > R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000 > > FS: 00007fa0af71b640(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00007fa0af71a4b8 CR3: 0000000022f5f000 CR4: 0000000000750ef0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > PKRU: 55555554 > > > ---------------- > > Code disassembly (best guess): > > 0: 02 00 add (%rax),%al > > 2: 00 4c 8b 75 add %cl,0x75(%rbx,%rcx,4) > > 6: 40 rex > > 7: 4d 8d be 58 ff ff ff lea -0xa8(%r14),%r15 > > e: 4c 89 ff mov %r15,%rdi > > 11: e8 a4 20 fa ff call 0xfffa20ba > > 16: 48 89 c2 mov %rax,%rdx > > 19: 48 89 c5 mov %rax,%rbp > > 1c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > > 23: fc ff df > > 26: 48 c1 ea 03 shr $0x3,%rdx > > * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- > > trapping instruction > > 2e: 0f 85 b0 02 00 00 jne 0x2e4 > > 34: 48 8b 45 00 mov 0x0(%rbp),%rax > > 38: 48 8d bb 34 05 00 00 lea 0x534(%rbx),%rdi > > 3f: 48 rex.W > > ``` > > We analyzed the root cause of this bug. When calling disable_store() > > in drivers/usb/core/port.c, if function authorized_store() is calling > > usb_deauthorized_device() concurrently, the usb_interface will be > > removed by usb_disable_device. However, in function disable_store, > > usb_hub_to_struct_hub() would try to deref interface, causing > > nullptr-deref. We also tested other functions in > > drivers/usb/core/port.c. So far we haven't found a similar problem. > > I don't see how this explanation could be correct. disable_store() is a > sysfs attribute file for the port device, so when it is called the port > device structure must still be registered. The interface structure > doesn't get removed until after usb_disable_device() calls device_del(), > which won't return until hub_disconnect() returns, which won't happen > until after the port devices are unregistered, which doesn't happen > until disable_store() calls sysfs_break_active_protection(), which is > after the call to usb_hub_to_struct_hub(). > > Can you do a little extra debugging to find out exactly which C > statement causes the trap? The disassembly above indicates the trap > happens during a compare against 0 inside disable_store() -- not inside > usb_hub_to_struct_hub(). Can you figure out which comparison that is? > Sorry for the mistake I made when debugging this bug. Now I have more information about it. Disassembly of function disable_store() in the latest upstream kernel is listed below. ``` Dump of assembler code for function disable_store: ... 0xffffffff86e907eb <+187>: lea -0x8(%r14),%r12 0xffffffff86e907ef <+191>: mov (%rbx),%rax 0xffffffff86e907f2 <+194>: mov %rax,0x20(%rsp) 0xffffffff86e907f7 <+199>: lea -0xa8(%rax),%rdi 0xffffffff86e907fe <+206>: mov %rdi,0x18(%rsp) 0xffffffff86e90803 <+211>: call 0xffffffff86e20220 <usb_hub_to_struct_hub> 0xffffffff86e90808 <+216>: mov %rax,%rbx 0xffffffff86e9080b <+219>: shr $0x3,%rax 0xffffffff86e9080f <+223>: movabs $0xdffffc0000000000,%rcx 0xffffffff86e90819 <+233>: cmpb $0x0,(%rax,%rcx,1) 0xffffffff86e9081d <+237>: je 0xffffffff86e90827 <disable_store+247> 0xffffffff86e9081f <+239>: mov %rbx,%rdi 0xffffffff86e90822 <+242>: call 0xffffffff81eeb0b0 <__asan_report_load8_noabort> 0xffffffff86e90827 <+247>: lea 0x60(%rsp),%rsi ... ``` The cmpb in disable_store()<+233> is generated by KASAN to check the shadow memory status. If equals 0, which means the load 8 is valid, pass the KASAN check. However, this time rax is 0, so it first triggers general protection fault, since 0xdffffc0000000000 is not a valid address. rax contains the return address of function usb_hub_to_struct_hub(), in this case is a NULL. In function usb_hub_to_struct_hub(), I checked hdev and its sub domains, and they are not NULL. Is it possible that usb_deauthorized_device() set hdev->actconfig->interface[0]->dev.driver_data to NULL? I cannot confirm that since every time I try to breakpoint the code it crashes differently. If there is any other thing I could help, please let me know. Best, Yue > Alan Stern > > > If you have any questions, please contact us. > > > > Reported by Yue Sun <samsun1006219@xxxxxxxxx> > > Reported by xingwei lee <xrivendell7@xxxxxxxxx> > > > > Best Regards, > > Yue