On 27.06.2018 15:55, Janosch Frank wrote: Wonder if this patch should rather be "KVM: s390: ..." > From: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx> > > Now that we have everything in place, let's allow huge (1m) pmds for > gmap linking, effectively allowing hugetlbfs backed guests. Transparent > huge pages and 2g huge pages are *not* supported through this change. > > General KVM huge page support on s390 has to be enabled via the > kvm.hpage module parameter. Either nested or hpage can be enabled, as > we currently do not support vSIE for huge backed guests. You might want to extend a little bit on how this might be handled in the future like "Once vSIE supports huge (1mb) pages, we can either drop the module parameter or enable it as default". > > For a guest the feature has to be enabled through the new > KVM_CAP_S390_HPAGE capability and the hpage module parameter. Enabling > it will disable cmm, pfmf and storage key interpretation. You might want to add how this would complicates things and why we disable them (no pgste on huge pages, but we might have to create fake 4k page tables for huge pages, and we have to tell the HW that these pgste are not to be used -> disable all assists that could make use of the PGSTE and force it into the manual handler instead). CMM in general not compatible. > > Signed-off-by: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx> > --- > Documentation/virtual/kvm/api.txt | 16 +++++++++++++++ > arch/s390/include/asm/mmu.h | 2 ++ > arch/s390/include/asm/mmu_context.h | 1 + > arch/s390/kvm/kvm-s390.c | 39 +++++++++++++++++++++++++++++++++++-- > arch/s390/mm/gmap.c | 8 +++++--- > include/uapi/linux/kvm.h | 1 + > 6 files changed, 62 insertions(+), 5 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index d10944e619d3..284acacea8c4 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -4391,6 +4391,22 @@ all such vmexits. > > Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits. > > +7.14 KVM_CAP_S390_HPAGE > + > +Architectures: s390 > +Parameters: none > +Returns: 0 on success, -EINVAL if hpage module parameter was not set > + or cmma is enabled > + > +With this capability the KVM support for memory backing with 1m pages > +through hugetlbfs can be enabled for a VM. This will disable cmma, It will never disable CMMA, as CMMA was not enabled before. > +pfmfi and the storage key interpretation. If cmma has already been > +enabled or the hpage module parameter is not set to 1, -EINVAL is > +returned. > + > +While it is generally possible to create and start a huge page backed > +VM without this capability, the VM will not be functional. will crash when trying to use a huge page? > + > 8. Other capabilities. > ---------------------- > > diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h > index f5ff9dbad8ac..652cd658e242 100644 > --- a/arch/s390/include/asm/mmu.h > +++ b/arch/s390/include/asm/mmu.h > @@ -24,6 +24,8 @@ typedef struct { > unsigned int uses_skeys:1; > /* The mmu context uses CMM. */ > unsigned int uses_cmm:1; > + /* The gmap associated with this context uses huge pages. */ > + unsigned int uses_gmap_hpage:1; uses implies "that we've seen a huge page", right? Shouldn't this be rather allow_gmap_hpage? (or to be more precise allow_gmap_1mb_hpage) > } mm_context_t; > > #define INIT_MM_CONTEXT(name) \ > diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h > index d16bc79c30bb..407165d805b9 100644 > --- a/arch/s390/include/asm/mmu_context.h > +++ b/arch/s390/include/asm/mmu_context.h > @@ -32,6 +32,7 @@ static inline int init_new_context(struct task_struct *tsk, > mm->context.has_pgste = 0; > mm->context.uses_skeys = 0; > mm->context.uses_cmm = 0; > + mm->context.uses_gmap_hpage = 0; > #endif > switch (mm->context.asce_limit) { > case _REGION2_SIZE: > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index de8dec849b60..41d4bfa31be1 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -171,7 +171,10 @@ struct kvm_s390_tod_clock_ext { > static int nested; > module_param(nested, int, S_IRUGO); > MODULE_PARM_DESC(nested, "Nested virtualization support"); > - > +/* allow 1m huge page guest backing, if !nested */ > +static int hpage; > +module_param(hpage, int, 0444); What's the latest trend, 0444 vs S_IRUGO? > +MODULE_PARM_DESC(hpage, "1m huge page backing support"); > > /* > * For now we handle at most 16 double words as this is what the s390 base > @@ -475,6 +478,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_S390_AIS_MIGRATION: > r = 1; > break; > + case KVM_CAP_S390_HPAGE: > + r = 0; > + if (hpage) > + r = 1; > + break; > case KVM_CAP_S390_MEM_OP: > r = MEM_OP_MAX_SIZE; > break; > @@ -671,6 +679,24 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) > VM_EVENT(kvm, 3, "ENABLE: CAP_S390_GS %s", > r ? "(not available)" : "(success)"); > break; > + case KVM_CAP_S390_HPAGE: > + mutex_lock(&kvm->lock); > + if (kvm->created_vcpus) > + r = -EBUSY; > + else if (!hpage || kvm->arch.use_cmma) /* CMM relies on PGSTE to work, however huge pages have no PGSTE */ You should also check for "kvm->mm->context.uses_gmap_hpage" when enabling CMMA using the capability. > + r = -EINVAL; > + else { > + r = 0; > + kvm->mm->context.uses_gmap_hpage = 1; > + /* They would complicate matters too much. */ Dito, please describe why this is complicated (or even impossible) /* we might have to create fake 4k page tables and have to hinder the HW from accessing these ... > + kvm->arch.use_skf = 0; > + kvm->arch.use_cmma = 0; Not necessary, you have an kvm->arch.use_cmma check above > + kvm->arch.use_pfmfi = 0; > + } > + mutex_unlock(&kvm->lock); > + VM_EVENT(kvm, 3, "ENABLE: CAP_S390_HPAGE %s", > + r ? "(not available)" : "(success)"); > + break; > case KVM_CAP_S390_USER_STSI: > VM_EVENT(kvm, 3, "%s", "ENABLE: CAP_S390_USER_STSI"); > kvm->arch.user_stsi = 1; > @@ -721,7 +747,11 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att > ret = -EBUSY; > VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support"); > mutex_lock(&kvm->lock); > - if (!kvm->created_vcpus) { > + if (kvm->created_vcpus) > + ret = -EBUSY; > + else if (kvm->mm->context.uses_gmap_hpage) > + ret = -EINVAL; > + else { > kvm->arch.use_cmma = 1; > /* Not compatible with cmma. */ > kvm->arch.use_pfmfi = 0; > @@ -4086,6 +4116,11 @@ static int __init kvm_s390_init(void) > return -ENODEV; > } > > + if (nested && hpage) { > + pr_info("nested (vSIE) and hpage (huge page backing) can currently not be activated concurrently"); > + return -EINVAL; > + } > + > for (i = 0; i < 16; i++) > kvm_s390_fac_base[i] |= > S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i); > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > index c691b9d9d223..84c6bad06949 100644 > --- a/arch/s390/mm/gmap.c > +++ b/arch/s390/mm/gmap.c > @@ -2,8 +2,10 @@ > /* > * KVM guest address space mapping code > * > - * Copyright IBM Corp. 2007, 2016 > + * Copyright IBM Corp. 2007, 2016, 2017 > * Author(s): Martin Schwidefsky <schwidefsky@xxxxxxxxxx> > + * David Hildenbrand <david@xxxxxxxxxx> > + * Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx> > */ > > #include <linux/kernel.h> > @@ -595,8 +597,8 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr) > return -EFAULT; > pmd = pmd_offset(pud, vmaddr); > VM_BUG_ON(pmd_none(*pmd)); > - /* large pmds cannot yet be handled */ > - if (pmd_large(*pmd)) > + /* Are we allowed to use huge pages? */ > + if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage) > return -EFAULT; > /* Link gmap segment table entry location to page table. */ > rc = radix_tree_preload(GFP_KERNEL); > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index b6270a3b38e9..ea007e900acd 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_GET_MSR_FEATURES 153 > #define KVM_CAP_HYPERV_EVENTFD 154 > #define KVM_CAP_HYPERV_TLBFLUSH 155 > +#define KVM_CAP_S390_HPAGE 156 Should we even change that to KVM_CAP_S390_1MB_HPAGE ? > > #ifdef KVM_CAP_IRQ_ROUTING > > -- Thanks, David / dhildenb