Dexuan Cui <decui@xxxxxxxxxxxxx> writes: > There is a rare race when we remove an entry from the global list > hv_context.percpu_list[cpu] in hv_process_channel_removal() -> > percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() -> > process_chn_event() -> pcpu_relid2channel() is trying to query the list, > we can get the kernel fault. > > Similarly, we also have the issue in the code path: vmbus_process_offer() -> > percpu_channel_enq(). > > We can resolve the issue by disabling the tasklet when updating the list. > > The patch also moves vmbus_release_relid() to a later place where > the channel has been removed from the per-cpu and the global lists. > > Reported-by: Rolf Neugebauer <rolf.neugebauer@xxxxxxxxxx> > Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> Tested 4.7-rc1 with this path applied and kernel always crashes on boot (WS2016TP5, 12 CPU SMP guest, Generation 2): [ 5.464251] hv_vmbus: Hyper-V Host Build:14300-10.0-1-0.1006; Vmbus version:4.0 [ 5.471666] hv_vmbus: Unknown GUID: f8e65716-3cb3-4a06-9a60-1889c5cccab5 [ 5.472143] BUG: unable to handle kernel paging request at 000000079fff5288 [ 5.477107] IP: [<ffffffffa0004b91>] vmbus_onoffer+0x311/0x570 [hv_vmbus] [ 5.477107] PGD 0 [ 5.477107] Oops: 0000 [#1] SMP [ 5.477107] Modules linked in: hv_vmbus [ 5.477107] CPU: 11 PID: 189 Comm: kworker/11:1 Not tainted 4.7.0-rc1_dc1_test+ #262 [ 5.477107] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 [ 5.477107] Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus] [ 5.477107] task: ffff8801796e4480 ti: ffff8801796e8000 task.ti: ffff8801796e8000 [ 5.477107] RIP: 0010:[<ffffffffa0004b91>] [<ffffffffa0004b91>] vmbus_onoffer+0x311/0x570 [hv_vmbus] [ 5.477107] RSP: 0018:ffff8801796ebc50 EFLAGS: 00010286 [ 5.477107] RAX: 00000000ffff8801 RBX: ffff880032641000 RCX: 0000000000000050 [ 5.477107] RDX: 0000000000040000 RSI: 0000000000000000 RDI: ffff880032641000 [ 5.477107] RBP: ffff8801796ebd10 R08: 0000000000000001 R09: 0000000000000001 [ 5.477107] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000010 [ 5.477107] R13: 4a063cb3f8e65716 R14: b5caccc58918609a R15: ffffffffa0008b60 [ 5.477107] FS: 0000000000000000(0000) GS:ffff88017c000000(0000) knlGS:0000000000000000 [ 5.477107] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.477107] CR2: 000000079fff5288 CR3: 0000000032613000 CR4: 00000000001406e0 [ 5.477107] Stack: [ 5.477107] ffff880032641780 ffff88003264102c 0010010000000046 ffffffffa000646e [ 5.477107] ffff8801796e5090 ffff8801796e4480 00000000004f827d 0000000000000001 [ 5.477107] 0000000000000000 ffff8801796ebce8 ffffffff810eaebc 00000000796e5058 [ 5.477107] Call Trace: [ 5.477107] [<ffffffff810eaebc>] ? __lock_acquire+0x3dc/0x730 [ 5.477107] [<ffffffffa0005263>] vmbus_onmessage+0x33/0xa0 [hv_vmbus] [ 5.477107] [<ffffffffa0001371>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus] [ 5.653321] [<ffffffff810abd1f>] process_one_work+0x1ff/0x6d0 [ 5.653321] [<ffffffff810abca1>] ? process_one_work+0x181/0x6d0 [ 5.653321] [<ffffffff810ac23e>] worker_thread+0x4e/0x490 [ 5.653321] [<ffffffff810ac1f0>] ? process_one_work+0x6d0/0x6d0 [ 5.653321] [<ffffffff810ac1f0>] ? process_one_work+0x6d0/0x6d0 [ 5.653321] [<ffffffff810b31b1>] kthread+0x101/0x120 [ 5.653321] [<ffffffff81739cef>] ret_from_fork+0x1f/0x40 [ 5.653321] [<ffffffff810b30b0>] ? kthread_create_on_node+0x250/0x250 [ 5.653321] Code: 74 24 08 48 c7 c7 60 6c 00 a0 e8 0a 9e 1b e1 b8 10 00 00 00 66 89 44 24 16 44 89 e6 48 89 df e8 f6 f9 ff ff 41 8b 87 f4 02 00 00 <48> 8b 14 c5 80 12 03 a0 f0 ff 42 10 48 8b 42 08 a8 02 75 f8 0f [ 5.653321] RIP [<ffffffffa0004b91>] vmbus_onoffer+0x311/0x570 [hv_vmbus] [ 5.653321] RSP <ffff8801796ebc50> [ 5.653321] CR2: 000000079fff5288 [ 5.653321] ---[ end trace 62df6070997f1f10 ]--- [ 5.653321] Kernel panic - not syncing: Fatal exception [ 5.653321] Kernel Offset: disabled [ 5.653321] ---[ end Kernel panic - not syncing: Fatal exception [ 5.653480] ------------[ cut here ]------------ I can investigate it tomorrow if this doesn't reproduce for you. <skip> -- Vitaly _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel