Re: [PATCH v3] Drivers: hv: vmbus: fix the race when querying & updating the percpu list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dexuan Cui <decui@xxxxxxxxxxxxx> writes:

> There is a rare race when we remove an entry from the global list
> hv_context.percpu_list[cpu] in hv_process_channel_removal() ->
> percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() ->
> process_chn_event() -> pcpu_relid2channel() is trying to query the list,
> we can get the kernel fault.
>
> Similarly, we also have the issue in the code path: vmbus_process_offer() ->
> percpu_channel_enq().
>
> We can resolve the issue by disabling the tasklet when updating the list.
>
> The patch also moves vmbus_release_relid() to a later place where
> the channel has been removed from the per-cpu and the global lists.
>
> Reported-by: Rolf Neugebauer <rolf.neugebauer@xxxxxxxxxx>
> Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>

Tested 4.7-rc1 with this path applied and kernel always crashes on boot
(WS2016TP5, 12 CPU SMP guest, Generation 2):

[    5.464251] hv_vmbus: Hyper-V Host Build:14300-10.0-1-0.1006; Vmbus version:4.0
[    5.471666] hv_vmbus: Unknown GUID: f8e65716-3cb3-4a06-9a60-1889c5cccab5
[    5.472143] BUG: unable to handle kernel paging request at 000000079fff5288
[    5.477107] IP: [<ffffffffa0004b91>] vmbus_onoffer+0x311/0x570 [hv_vmbus]
[    5.477107] PGD 0 
[    5.477107] Oops: 0000 [#1] SMP
[    5.477107] Modules linked in: hv_vmbus
[    5.477107] CPU: 11 PID: 189 Comm: kworker/11:1 Not tainted 4.7.0-rc1_dc1_test+ #262
[    5.477107] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
[    5.477107] Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
[    5.477107] task: ffff8801796e4480 ti: ffff8801796e8000 task.ti: ffff8801796e8000
[    5.477107] RIP: 0010:[<ffffffffa0004b91>]  [<ffffffffa0004b91>] vmbus_onoffer+0x311/0x570 [hv_vmbus]
[    5.477107] RSP: 0018:ffff8801796ebc50  EFLAGS: 00010286
[    5.477107] RAX: 00000000ffff8801 RBX: ffff880032641000 RCX: 0000000000000050
[    5.477107] RDX: 0000000000040000 RSI: 0000000000000000 RDI: ffff880032641000
[    5.477107] RBP: ffff8801796ebd10 R08: 0000000000000001 R09: 0000000000000001
[    5.477107] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000010
[    5.477107] R13: 4a063cb3f8e65716 R14: b5caccc58918609a R15: ffffffffa0008b60
[    5.477107] FS:  0000000000000000(0000) GS:ffff88017c000000(0000) knlGS:0000000000000000
[    5.477107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.477107] CR2: 000000079fff5288 CR3: 0000000032613000 CR4: 00000000001406e0
[    5.477107] Stack:
[    5.477107]  ffff880032641780 ffff88003264102c 0010010000000046 ffffffffa000646e
[    5.477107]  ffff8801796e5090 ffff8801796e4480 00000000004f827d 0000000000000001
[    5.477107]  0000000000000000 ffff8801796ebce8 ffffffff810eaebc 00000000796e5058
[    5.477107] Call Trace:
[    5.477107]  [<ffffffff810eaebc>] ? __lock_acquire+0x3dc/0x730
[    5.477107]  [<ffffffffa0005263>] vmbus_onmessage+0x33/0xa0 [hv_vmbus]
[    5.477107]  [<ffffffffa0001371>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus]
[    5.653321]  [<ffffffff810abd1f>] process_one_work+0x1ff/0x6d0
[    5.653321]  [<ffffffff810abca1>] ? process_one_work+0x181/0x6d0
[    5.653321]  [<ffffffff810ac23e>] worker_thread+0x4e/0x490
[    5.653321]  [<ffffffff810ac1f0>] ? process_one_work+0x6d0/0x6d0
[    5.653321]  [<ffffffff810ac1f0>] ? process_one_work+0x6d0/0x6d0
[    5.653321]  [<ffffffff810b31b1>] kthread+0x101/0x120
[    5.653321]  [<ffffffff81739cef>] ret_from_fork+0x1f/0x40
[    5.653321]  [<ffffffff810b30b0>] ? kthread_create_on_node+0x250/0x250
[    5.653321] Code: 74 24 08 48 c7 c7 60 6c 00 a0 e8 0a 9e 1b e1 b8 10 00 00 00 66 89 44 24 16 44 89 e6 48 89 df e8 f6 f9 ff ff 41 8b 87 f4 02 00 00 <48> 8b 14 c5 80 12 03 a0 f0 ff 42 10 48 8b 42 08 a8 02 75 f8 0f 
[    5.653321] RIP  [<ffffffffa0004b91>] vmbus_onoffer+0x311/0x570 [hv_vmbus]
[    5.653321]  RSP <ffff8801796ebc50>
[    5.653321] CR2: 000000079fff5288
[    5.653321] ---[ end trace 62df6070997f1f10 ]---
[    5.653321] Kernel panic - not syncing: Fatal exception
[    5.653321] Kernel Offset: disabled
[    5.653321] ---[ end Kernel panic - not syncing: Fatal exception
[    5.653480] ------------[ cut here ]------------

I can investigate it tomorrow if this doesn't reproduce for you.

<skip>

-- 
  Vitaly
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel



[Index of Archives]     [Linux Driver Backports]     [DMA Engine]     [Linux GPIO]     [Linux SPI]     [Video for Linux]     [Linux USB Devel]     [Linux Coverity]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]
  Powered by Linux