Re: [PATCH v6 00/18] kvm: arm64: Dynamic IPA and 52bit IPA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Suzuki,

On 9/26/18 6:32 PM, Suzuki K Poulose wrote:
> 
> The physical address space size for a VM (IPA size) on arm/arm64 is
> limited to a static limit of 40bits. This series adds support for
> using an IPA size specific to a VM, allowing to use a size supported
> by the host (based on the host kernel configuration and CPU support).
> The default size is fixed to 40bits. On arm64, we can allow the limit
> to be lowered (limiting the number of levels in stage2 to 2, to prevent
> splitting the host PMD huge pages at stage2). We also add support for
> handling 52bit IPA addresses (where supported) added by Arm v8.2
> extensions.
> 
> We need to set the IPA limit as early as the VM creation to keep the
> code simpler to avoid sprinkling checks everywhere to ensure that the
> IPA is configured. We encode the IPA size in the machine_type
> argument to KVM_CREATE_VM ioctl. Bits [7-0] of the type are reserved
> for the IPA size. The availability of this feature is advertised by a
> new cap KVM_CAP_ARM_VM_IPA_SIZE. When supported, this capability
> returns the maximum IPA shift supported by the host. The supported IPA
> size on a host could be different from the system's PARange indicated
> by the CPUs (e.g, kernel limit on the PA size).
> 
> Supporting different IPA size requires modification to the stage2 page
> table code. The arm64 page table level helpers are defined based on the
> page table levels used by the host VA. So, the accessors may not work
> if the guest uses more number of levels in stage2 than the stage1
> of the host.  The previous versions (v1 & v2) of this series refactored
> the stage1 page table accessors to reuse the low-level accessors for an
> independent stage2 table. However, due to the level folding in the
> generic code, the types are redefined as well. i.e, if the PUD is
> folded, the pud_t could be defined as :
> 
>  typedef struct { pgd_t pgd; } pud_t;
> 
> similarly for pmd_t.  So, without stage1 independent page table entry
> types for stage2, we could be dealing with a different type for level
>  0-2 entries. This is practically fine on arm/arm64 as the entries
> have similar format and size and we always use the appropriate
> accessors to get the raw value (i.e, pud_val/pmd_val etc). But not
> ideal for a solution upstream. So, this version caps the stage2 page
> table levels to that of the stage1. This has the following impact on
> the IPA support for various pagesize/host-va combinations :
> 
> 
> x-----------------------------------------------------x
> | host\ipa    | 40bit | 42bit | 44bit | 48bit | 52bit |
> -------------------------------------------------------
> | 39bit-4K    |  y    |   y   |  n    |   n   |  n/a  |
> -------------------------------------------------------
> | 48bit-4K    |  y    |   y   |  y    |   y   |  n/a  |
> -------------------------------------------------------
> | 36bit-16K   |  y    |   n   |  n    |   n   |  n/a  |
> -------------------------------------------------------
> | 47bit-16K   |  y    |   y   |  y    |   y   |  n/a  |
> -------------------------------------------------------
> | 48bit-4K    |  y    |   y   |  y    |   y   |  n/a  |
> -------------------------------------------------------
> | 42bit-64K   |  y    |   y   |  y    |   n   |  n    |
> -------------------------------------------------------
> | 48bit-64K   |  y    |   y   |  y    |   y   |  y    |
> x-----------------------------------------------------x
> 
> Or the following list shows what cannot be supported :
> 
>  39bit-4K host  | [44 - 48]
>  36bit-16K host | [41 - 48]
>  42bit-64K host | [47 - 52]
> 
> which is not really bad. We can pursue the independent stage2
> page table support and lift the restriction once we get there.
> Given there is a proposal for new generic page table walker [0],
> it would make sense to make our efforts in sync with it to avoid
> diverting from a common API.
> 
> 52bit support is added for VGIC (including ITS emulation) and handling
> of PAR, HPFAR registers.
> 
> The series applies on 4.19-rc4. A tree is available here:
> 
> 	 git://linux-arm.org/linux-skp.git ipa52/v6
> 
> Tested with
>   - Modified kvmtool, which can only be used for (patches included in
>     the series for reference / testing):
>     * with virtio-pci upto 44bit PA (Due to 4K page size for virtio-pci
>       legacy implemented by kvmtool)
>     * Upto 48bit PA with virtio-mmio, due to 32bit PFN limitation.
>   - Hacked Qemu (boot loader support for highmem, IPA size support)
>     * with virtio-pci GIC-v3 ITS & MSI upto 52bit on Foundation model.
>     Also see [1] for Qemu support.
> 
> [0] https://lkml.org/lkml/2018/4/24/777
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg05759.html
> 
> Change since v5:
>  - Don't raise the IPA Limit to 40bits on systems with lower PA size.
>    Doesn't break backward compatibility, we still allow KVM_CREATE_VM
>    to succeed with "0" as the IPA size (40bits). But prevent specifying
>    40bit explicitly, when the limit is lower.
>  - Rename CAP, KVM_CAP_ARM_VM_PHYS_SHIFT => KVM_CAP_ARM_VM_IPA_SIZE
>    and helper, KVM_VM_TYPE_ARM_VM_PHY_SHIFT => KVM_VM_TYPE_ARM_VM_IPA_SIZE
>  - Update Documentation of the API
>  - Update comments and commit description as reported by Eric
>  - Set the missing TCR_T0SZ in patch "kvm: arm64: Configure VTCR_EL2 per VM"
>  - Fix bits for CBASER_ADDRESS mask, GITS_CBASER_ADDRESS()
> 
> Changes since V4:
>  - Rebased on v4.19-rc3
>  - Dropped virtio patches queued already by mst.
>  - Collect Acks from Christoffer
>  - Restrict IPA configuration support to arm64 only
>  - Use KVM_CAP_ARM_VM_PHYS_SHIFT for detecting the support for
>    IPA size configuration along with the limit on the IPA for the host.
>  - Update comments on __load_guest_stage2
>  - Add comment about the default value for unknown PARange values.
>  - Update Documentation of the API
> 
> Changes since V3:
>  - Use per-VM VTCR instead per-VM private VTCR bits
>  - Allow IPA less than 40bits
>  - Split the patch adding support for stage2 dynamic page tables
>  - Rearrange the series to keep the userspace API at the end, which
>    needs further discussion.
>  - Collect Reviews/Acks from Eric & Marc
> 
> Changes since V2:
>  - Drop "refactoring of host page table helpers" and restrict the IPA size
>    to make sure stage2 doesn't use more page table levels than that of the host.
>  - Load VTCR for TLB operations on behalf of the VM (Pointed-by: James Morse)
>  - Split a couple of patches to make them easier to review.
>  - Fall back to normal (non-concatenated) entry level page table support if
>    possible.
>  - Bump the IOCTL number
> 
> Changes since V1:
>  - Change the userspace API for configuring VM to encode the IPA
>    size in the VM type.  (suggested by Christoffer)
>  - Expose the IPA limit on the host via ioctl on /dev/kvm
>  - Handle 52bit addresses in PAR & HPFAR
>  - Drop patch changing the life time of stage2 PGD
>  - Rename macros for 48-to-52 bit conversion for GIC ITS BASER.
>    (suggested by Christoffer)
>  - Split virtio PFN check patches and address comments.
> 
> 
> Kristina Martsenko (1):
>   vgic: Add support for 52bit guest physical address
> 
> Suzuki K Poulose (17):
>   kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
>   kvm: arm/arm64: Remove spurious WARN_ON
>   kvm: arm64: Add helper for loading the stage2 setting for a VM
>   arm64: Add a helper for PARange to physical shift conversion
>   kvm: arm64: Clean up VTCR_EL2 initialisation
>   kvm: arm/arm64: Allow arch specific configurations for VM
>   kvm: arm64: Configure VTCR_EL2 per VM
>   kvm: arm/arm64: Prepare for VM specific stage2 translations
>   kvm: arm64: Prepare for dynamic stage2 page table layout
>   kvm: arm64: Make stage2 page table layout dynamic
>   kvm: arm64: Dynamic configuration of VTTBR mask
>   kvm: arm64: Configure VTCR_EL2.SL0 per VM
>   kvm: arm64: Switch to per VM IPA limit
>   kvm: arm64: Add 52bit support for PAR to HPFAR conversoin
>   kvm: arm64: Set a limit on the IPA size
>   kvm: arm64: Limit the minimum number of page table levels
>   kvm: arm64: Allow tuning the physical address size for VM
> 
>  Documentation/virtual/kvm/api.txt             |  31 +++
>  arch/arm/include/asm/kvm_arm.h                |   3 +-
>  arch/arm/include/asm/kvm_host.h               |   7 +
>  arch/arm/include/asm/kvm_mmu.h                |  15 +-
>  arch/arm/include/asm/stage2_pgtable.h         |  50 ++--
>  arch/arm64/include/asm/cpufeature.h           |  20 ++
>  arch/arm64/include/asm/kvm_arm.h              | 157 +++++++++---
>  arch/arm64/include/asm/kvm_asm.h              |   2 -
>  arch/arm64/include/asm/kvm_host.h             |  16 +-
>  arch/arm64/include/asm/kvm_hyp.h              |  10 +
>  arch/arm64/include/asm/kvm_mmu.h              |  42 +++-
>  arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 ----
>  arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 ---
>  arch/arm64/include/asm/stage2_pgtable.h       | 236 +++++++++++++-----
>  arch/arm64/kvm/hyp/Makefile                   |   1 -
>  arch/arm64/kvm/hyp/s2-setup.c                 |  90 -------
>  arch/arm64/kvm/hyp/switch.c                   |   4 +-
>  arch/arm64/kvm/hyp/tlb.c                      |   4 +-
>  arch/arm64/kvm/reset.c                        | 103 ++++++++
>  include/linux/irqchip/arm-gic-v3.h            |   5 +
>  include/uapi/linux/kvm.h                      |  10 +
>  virt/kvm/arm/arm.c                            |   9 +-
>  virt/kvm/arm/mmu.c                            | 120 ++++-----
>  virt/kvm/arm/vgic/vgic-its.c                  |  36 +--
>  virt/kvm/arm/vgic/vgic-kvm-device.c           |   2 +-
>  virt/kvm/arm/vgic/vgic-mmio-v3.c              |   2 -
>  26 files changed, 648 insertions(+), 408 deletions(-)
>  delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
>  delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h
>  delete mode 100644 arch/arm64/kvm/hyp/s2-setup.c
> 
> kvmtool changes:
> 
> Suzuki K Poulose (4):
>   kvmtool: Allow backends to run checks on the KVM device fd
>   kvmtool: arm64: Add support for guest physical address size
>   kvmtool: arm64: Switch memory layout
>   kvmtool: arm: Add support for creating VM with PA size
> 
>  arm/aarch32/include/kvm/kvm-arch.h        |  6 ++--
>  arm/aarch64/include/kvm/kvm-arch.h        | 15 ++++++++--
>  arm/aarch64/include/kvm/kvm-config-arch.h |  5 +++-
>  arm/include/arm-common/kvm-arch.h         | 17 ++++++++----
>  arm/include/arm-common/kvm-config-arch.h  |  1 +
>  arm/kvm.c                                 | 34 ++++++++++++++++++++++-
>  include/kvm/kvm.h                         |  4 +++
>  kvm.c                                     |  2 ++
>  8 files changed, 71 insertions(+), 13 deletions(-)
> 

Feel free to add
Tested-by: Eric Auger <eric.auger@xxxxxxxxxx>

I tested this series with QEMU, using cold plugged 4GB PC-DIMM at 2TB on
a Gigabyte machine. The VM is created with 43 IPA bits. I ran memtester
on guest at 2TB using "memtester -p 20000000000 1G 1" and it succeeds.

Thanks

Eric



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux