On 28.06.2018 09:15, David Hildenbrand wrote: > On 27.06.2018 15:55, Janosch Frank wrote: > > Wonder if this patch should rather be "KVM: s390: ..." > >> From: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx> >> >> Now that we have everything in place, let's allow huge (1m) pmds for >> gmap linking, effectively allowing hugetlbfs backed guests. Transparent >> huge pages and 2g huge pages are *not* supported through this change. >> >> General KVM huge page support on s390 has to be enabled via the >> kvm.hpage module parameter. Either nested or hpage can be enabled, as >> we currently do not support vSIE for huge backed guests. > > You might want to extend a little bit on how this might be handled in > the future like "Once vSIE supports huge (1mb) pages, we can either drop > the module parameter or enable it as default". > >> >> For a guest the feature has to be enabled through the new >> KVM_CAP_S390_HPAGE capability and the hpage module parameter. Enabling >> it will disable cmm, pfmf and storage key interpretation. > > You might want to add how this would complicates things and why we > disable them (no pgste on huge pages, but we might have to create fake > 4k page tables for huge pages, and we have to tell the HW that these > pgste are not to be used -> disable all assists that could make use of > the PGSTE and force it into the manual handler instead). CMM in general > not compatible. > >> >> Signed-off-by: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx> >> --- >> Documentation/virtual/kvm/api.txt | 16 +++++++++++++++ >> arch/s390/include/asm/mmu.h | 2 ++ >> arch/s390/include/asm/mmu_context.h | 1 + >> arch/s390/kvm/kvm-s390.c | 39 +++++++++++++++++++++++++++++++++++-- >> arch/s390/mm/gmap.c | 8 +++++--- >> include/uapi/linux/kvm.h | 1 + >> 6 files changed, 62 insertions(+), 5 deletions(-) >> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >> index d10944e619d3..284acacea8c4 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -4391,6 +4391,22 @@ all such vmexits. >> >> Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits. >> >> +7.14 KVM_CAP_S390_HPAGE >> + >> +Architectures: s390 >> +Parameters: none >> +Returns: 0 on success, -EINVAL if hpage module parameter was not set >> + or cmma is enabled >> + >> +With this capability the KVM support for memory backing with 1m pages >> +through hugetlbfs can be enabled for a VM. This will disable cmma, > > It will never disable CMMA, as CMMA was not enabled before. "After the capability is enabled, cmma can't be enabled anymore and pfmfi and the storage key interpretation are disabled." > >> +pfmfi and the storage key interpretation. If cmma has already been >> +enabled or the hpage module parameter is not set to 1, -EINVAL is >> +returned. >> + >> +While it is generally possible to create and start a huge page backed >> +VM without this capability, the VM will not be functional. > > will crash when trying to use a huge page? Crash implies it ran at a certain point and that it'll burn down horribly, but QEMU will only pause it after the EFAULT if I remember it correctly. "While it is generally possible to create a huge page backed VM without this capability, the VM will not be able to run." > >> + >> 8. Other capabilities. >> ---------------------- >> >> diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h >> index f5ff9dbad8ac..652cd658e242 100644 >> --- a/arch/s390/include/asm/mmu.h >> +++ b/arch/s390/include/asm/mmu.h >> @@ -24,6 +24,8 @@ typedef struct { >> unsigned int uses_skeys:1; >> /* The mmu context uses CMM. */ >> unsigned int uses_cmm:1; >> + /* The gmap associated with this context uses huge pages. */ >> + unsigned int uses_gmap_hpage:1; > > uses implies "that we've seen a huge page", right? Shouldn't this be > rather allow_gmap_hpage? (or to be more precise allow_gmap_1mb_hpage) allow_gmap_hpage_1m We might want to prefix all of them with kvm_ in the future, so it's clear they belong to kvm. > >> } mm_context_t; >> >> #define INIT_MM_CONTEXT(name) \ >> diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h >> index d16bc79c30bb..407165d805b9 100644 >> --- a/arch/s390/include/asm/mmu_context.h >> +++ b/arch/s390/include/asm/mmu_context.h >> @@ -32,6 +32,7 @@ static inline int init_new_context(struct task_struct *tsk, >> mm->context.has_pgste = 0; >> mm->context.uses_skeys = 0; >> mm->context.uses_cmm = 0; >> + mm->context.uses_gmap_hpage = 0; >> #endif >> switch (mm->context.asce_limit) { >> case _REGION2_SIZE: >> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c >> index de8dec849b60..41d4bfa31be1 100644 >> --- a/arch/s390/kvm/kvm-s390.c >> +++ b/arch/s390/kvm/kvm-s390.c >> @@ -171,7 +171,10 @@ struct kvm_s390_tod_clock_ext { >> static int nested; >> module_param(nested, int, S_IRUGO); >> MODULE_PARM_DESC(nested, "Nested virtualization support"); >> - >> +/* allow 1m huge page guest backing, if !nested */ >> +static int hpage; >> +module_param(hpage, int, 0444); > > What's the latest trend, 0444 vs S_IRUGO? checkpatch wants octals > >> +MODULE_PARM_DESC(hpage, "1m huge page backing support"); >> >> /* >> * For now we handle at most 16 double words as this is what the s390 base >> @@ -475,6 +478,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >> case KVM_CAP_S390_AIS_MIGRATION: >> r = 1; >> break; >> + case KVM_CAP_S390_HPAGE: >> + r = 0; >> + if (hpage) >> + r = 1; >> + break; >> case KVM_CAP_S390_MEM_OP: >> r = MEM_OP_MAX_SIZE; >> break; >> @@ -671,6 +679,24 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) >> VM_EVENT(kvm, 3, "ENABLE: CAP_S390_GS %s", >> r ? "(not available)" : "(success)"); >> break; >> + case KVM_CAP_S390_HPAGE: >> + mutex_lock(&kvm->lock); >> + if (kvm->created_vcpus) >> + r = -EBUSY; >> + else if (!hpage || kvm->arch.use_cmma) > > /* CMM relies on PGSTE to work, however huge pages have no PGSTE */ > > You should also check for "kvm->mm->context.uses_gmap_hpage" when > enabling CMMA using the capability. You mean the mem ctrl? I already do that, or do I miss something here? - if (!kvm->created_vcpus) { + if (kvm->created_vcpus) + ret = -EBUSY; + else if (kvm->mm->context.uses_gmap_hpage) + ret = -EINVAL; + else { kvm->arch.use_cmma = 1; > >> + r = -EINVAL; >> + else { >> + r = 0; >> + kvm->mm->context.uses_gmap_hpage = 1; >> + /* They would complicate matters too much. */ > > Dito, please describe why this is complicated (or even impossible) > > /* we might have to create fake 4k page tables and have to hinder the HW > from accessing these ... /* * We might have to create fake 4k page * tables. To avoid that the hardware works on * them we emulate these instructions. */ > >> + kvm->arch.use_skf = 0; >> + kvm->arch.use_cmma = 0; > > Not necessary, you have an kvm->arch.use_cmma check above > >> + kvm->arch.use_pfmfi = 0; >> + } >> + mutex_unlock(&kvm->lock); >> + VM_EVENT(kvm, 3, "ENABLE: CAP_S390_HPAGE %s", >> + r ? "(not available)" : "(success)"); >> + break; >> case KVM_CAP_S390_USER_STSI: >> VM_EVENT(kvm, 3, "%s", "ENABLE: CAP_S390_USER_STSI"); >> kvm->arch.user_stsi = 1; >> @@ -721,7 +747,11 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att >> ret = -EBUSY; >> VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support"); >> mutex_lock(&kvm->lock); >> - if (!kvm->created_vcpus) { >> + if (kvm->created_vcpus) >> + ret = -EBUSY; >> + else if (kvm->mm->context.uses_gmap_hpage) >> + ret = -EINVAL; >> + else { >> kvm->arch.use_cmma = 1; >> /* Not compatible with cmma. */ >> kvm->arch.use_pfmfi = 0; >> @@ -4086,6 +4116,11 @@ static int __init kvm_s390_init(void) >> return -ENODEV; >> } >> >> + if (nested && hpage) { >> + pr_info("nested (vSIE) and hpage (huge page backing) can currently not be activated concurrently"); >> + return -EINVAL; >> + } >> + >> for (i = 0; i < 16; i++) >> kvm_s390_fac_base[i] |= >> S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i); >> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c >> index c691b9d9d223..84c6bad06949 100644 >> --- a/arch/s390/mm/gmap.c >> +++ b/arch/s390/mm/gmap.c >> @@ -2,8 +2,10 @@ >> /* >> * KVM guest address space mapping code >> * >> - * Copyright IBM Corp. 2007, 2016 >> + * Copyright IBM Corp. 2007, 2016, 2017 >> * Author(s): Martin Schwidefsky <schwidefsky@xxxxxxxxxx> >> + * David Hildenbrand <david@xxxxxxxxxx> >> + * Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx> >> */ >> >> #include <linux/kernel.h> >> @@ -595,8 +597,8 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr) >> return -EFAULT; >> pmd = pmd_offset(pud, vmaddr); >> VM_BUG_ON(pmd_none(*pmd)); >> - /* large pmds cannot yet be handled */ >> - if (pmd_large(*pmd)) >> + /* Are we allowed to use huge pages? */ >> + if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage) >> return -EFAULT; >> /* Link gmap segment table entry location to page table. */ >> rc = radix_tree_preload(GFP_KERNEL); >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index b6270a3b38e9..ea007e900acd 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt { >> #define KVM_CAP_GET_MSR_FEATURES 153 >> #define KVM_CAP_HYPERV_EVENTFD 154 >> #define KVM_CAP_HYPERV_TLBFLUSH 155 >> +#define KVM_CAP_S390_HPAGE 156 > > Should we even change that to KVM_CAP_S390_1MB_HPAGE ? I could live with KVM_CAP_S390_HPAGE_1M > >> >> #ifdef KVM_CAP_IRQ_ROUTING >> >> > >
Attachment:
signature.asc
Description: OpenPGP digital signature