Re: [PATCH 5.10] KVM: x86: Properly handle APF vs disabled LAPIC situation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 24, 2022 at 02:42:04PM +0800, Guoqing Jiang wrote:
> From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> 
> Backport of commit 2f15d027c05fac406decdb5eceb9ec0902b68f53 upstream.
> 
> Async PF 'page ready' event may happen when LAPIC is (temporary) disabled.
> In particular, Sebastien reports that when Linux kernel is directly booted
> by Cloud Hypervisor, LAPIC is 'software disabled' when APF mechanism is
> initialized. On initialization KVM tries to inject 'wakeup all' event and
> puts the corresponding token to the slot. It is, however, failing to inject
> an interrupt (kvm_apic_set_irq() -> __apic_accept_irq() -> !apic_enabled())
> so the guest never gets notified and the whole APF mechanism gets stuck.
> The same issue is likely to happen if the guest temporary disables LAPIC
> and a previously unavailable page becomes available.
> 
> Do two things to resolve the issue:
> - Avoid dequeuing 'page ready' events from APF queue when LAPIC is
>   disabled.
> - Trigger an attempt to deliver pending 'page ready' events when LAPIC
>   becomes enabled (SPIV or MSR_IA32_APICBASE).
> 
> Reported-by: Sebastien Boeuf <sebastien.boeuf@xxxxxxxxx>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> Message-Id: <20210422092948.568327-1-vkuznets@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> [Guoqing: backport to 5.10-stable ]
> Signed-off-by: Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
> ---
> Hi,
> 
> We encountered below task hang issue with 5.10.113 stable kernel.
> 
> [  246.845061] INFO: task rpmbuild:2303 blocked for more than 122 seconds.
> [  246.846269]       Not tainted 5.10.113-1.1.se2-default #1
> [  246.847103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  246.848248] task:rpmbuild        state:D stack:    0 pid: 2303 ppid:  2302 flags:0x00000000
> [  246.848252] Call Trace:
> [  246.848266]  __schedule+0x3f6/0x7c0
> [  246.848289]  ? __handle_mm_fault+0x3dd/0x6d0
> [  246.848291]  schedule+0x46/0xb0
> [  246.848295]  kvm_async_pf_task_wait_schedule+0x4b/0x90
> [  246.848297]  ? handle_mm_fault+0xbc/0x280
> [  246.848300]  __kvm_handle_async_pf+0x4f/0xb0
> [  246.848303]  exc_page_fault+0x204/0x540
> [  246.848305]  ? asm_exc_page_fault+0x8/0x30
> [  246.848307]  asm_exc_page_fault+0x1e/0x30
> [  246.848310] RIP: 0033:0x7f122fbdfc90
> 
> And after investigating, this patch resolve the issue. 5.12 stable kernel
> has already merged it by commit 36825931c607.

Now queued up, thanks.

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux