Re: [PATCH] VMCI: Release resource if the work is already queued

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/20/19, 8:48 PM, "Nadav Amit" <namit@xxxxxxxxxx> wrote:
> Francois reported that VMware balloon gets stuck after a balloon reset,
> when the VMCI doorbell is removed. A similar error can occur when the
> balloon driver is removed with the following splat:
> 
> [ 1088.622000] INFO: task modprobe:3565 blocked for more than 120 seconds.
> [ 1088.622035]       Tainted: G        W         5.2.0 #4
> [ 1088.622087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1088.622205] modprobe        D    0  3565   1450 0x00000000
> [ 1088.622210] Call Trace:
> [ 1088.622246]  __schedule+0x2a8/0x690
> [ 1088.622248]  schedule+0x2d/0x90
> [ 1088.622250]  schedule_timeout+0x1d3/0x2f0
> [ 1088.622252]  wait_for_completion+0xba/0x140
> [ 1088.622320]  ? wake_up_q+0x80/0x80
> [ 1088.622370]  vmci_resource_remove+0xb9/0xc0 [vmw_vmci]
> [ 1088.622373]  vmci_doorbell_destroy+0x9e/0xd0 [vmw_vmci]
> [ 1088.622379]  vmballoon_vmci_cleanup+0x6e/0xf0 [vmw_balloon]
> [ 1088.622381]  vmballoon_exit+0x18/0xcc8 [vmw_balloon]
> [ 1088.622394]  __x64_sys_delete_module+0x146/0x280
> [ 1088.622408]  do_syscall_64+0x5a/0x130
> [ 1088.622410]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1088.622415] RIP: 0033:0x7f54f62791b7
> [ 1088.622421] Code: Bad RIP value.
> [ 1088.622421] RSP: 002b:00007fff2a949008 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [ 1088.622426] RAX: ffffffffffffffda RBX: 000055dff8b55d00 RCX: 00007f54f62791b7
> [ 1088.622426] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055dff8b55d68
> [ 1088.622427] RBP: 000055dff8b55d00 R08: 00007fff2a947fb1 R09: 0000000000000000
> [ 1088.622427] R10: 00007f54f62f5cc0 R11: 0000000000000206 R12: 000055dff8b55d68
> [ 1088.622428] R13: 0000000000000001 R14: 000055dff8b55d68 R15: 00007fff2a94a3f0
> 
> The cause for the bug is that when the "delayed" doorbell is invoked, it
> takes a reference on the doorbell entry and schedules work that is
> supposed to run the appropriate code and drop the doorbell entry
> reference. The code ignores the fact that if the work is already queued,
> it will not be scheduled to run one more time. As a result one of the
> references would not be dropped. When the code waits for the reference
> to get to zero, during balloon reset or module removal, it gets stuck.
>
> Fix it. Drop the reference if schedule_work() indicates that the work is
> already queued.
>
> Note that this bug got more apparent (or apparent at all) due to
> commit ce664331b248 ("vmw_balloon: VMCI_DOORBELL_SET does not check status").
>
> Fixes: 83e2ec765be03 ("VMCI: doorbell implementation.")
> Reported-by: Francois Rigault <rigault.francois@xxxxxxxxx>
> Cc: Jorgen Hansen <jhansen@xxxxxxxxxx>
> Cc: Adit Ranadive <aditr@xxxxxxxxxx>
> Cc: Alexios Zavras <alexios.zavras@xxxxxxxxx>
> Cc: Vishnu DASA <vdasa@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
> ---
> drivers/misc/vmw_vmci/vmci_doorbell.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)

Thanks for the fix, looks good to me.
Reviewed-by: Vishnu Dasa <vdasa@xxxxxxxxxx>

--
vishnu





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux