Re: [PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Alexey,

On Fri, 2021-08-27 at 14:07 +1000, Alexey Kardashevskiy wrote:
> Since e1a1ef84cd07, pages for TCE tables for KVM guests are allocated
> only when needed. This allows skipping any update when clearing TCEs.
> This works mostly fine as TCE updates are handled when MMU is enabled.
> The realmode handlers fail with H_TOO_HARD when pages are not yet
> allocated except when clearing a TCE in which case KVM prints a warning
> but proceeds to dereference a NULL pointer which crashes the host OS.
> 
> This has not been caught so far as the change is reasonably new,
> POWER9 runs mostly radix which does not use realmode handlers.
> With hash, the default TCE table is memset() by QEMU the machine reset
> which triggers page faults and the KVM TCE device's
> kvm_spapr_tce_fault()
> handles those with MMU on. And the huge DMA windows are not cleared
> by VMs whicn instead successfully create a DMA window big enough to map
> the VM memory 1:1 and then VMs just map everything without clearing.
> 
> This started crashing now as upcoming sriov-under-powervm support added
> a mode when a dymanic DMA window not big enough to map the VM memory
> 1:1
> but it is used anyway, and the VM now is the first (i.e. not QEMU) to
> clear a just created table. Note that the upstream QEMU needs to be
> modified to trigger the VM to trigger the host OS crash.
> 
> This replaces WARN_ON_ONCE_RM() with a check and return.
> This adds another warning if TCE is not being cleared.
> 
> Cc: Leonardo Bras <leobras.c@xxxxxxxxx>
> Fixes: e1a1ef84cd07 ("KVM: PPC: Book3S: Allocate guest TCEs on demand
> too")
> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
> ---
> 
> With recent changes in the printk() department, calling pr_err() when
> MMU
> off causes lockdep lockups which I did not dig any further so we should
> start getting rid of the realmode's WARN_ON_ONCE_RM().
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 083a4e037718..e5ba96c41f3f 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -173,10 +173,13 @@ static void kvmppc_rm_tce_put(struct
> kvmppc_spapr_tce_table *stt,
>         idx -= stt->offset;
>         page = stt->pages[idx / TCES_PER_PAGE];
>         /*
> -        * page must not be NULL in real mode,
> -        * kvmppc_rm_ioba_validate() must have taken care of this.
> +        * kvmppc_rm_ioba_validate() allows pages not be allocated if
> TCE is
> +        * being cleared, otherwise it returns H_TOO_HARD and we skip
> this.
>          */
> -       WARN_ON_ONCE_RM(!page);
> +       if (!page) {
> +               WARN_ON_ONCE_RM(tce != 0);
> +               return;
> +       }
>         tbl = kvmppc_page_address(page);
>  
>         tbl[idx % TCES_PER_PAGE] = tce;

That looks reasonable IMHO.

FWIW:
Reviewed-by: Leonardo Bras <leobras.c@xxxxxxxxx>




[Index of Archives]     [KVM Development]     [KVM ARM]     [KVM ia64]     [Linux Virtualization]     [Linux USB Devel]     [Linux Video]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux