Re: [PATCH v4 9/9] s390/mm: Enable gmap huge pmd support

Janosch Frank <frankja@xxxxxxxxxxxxx> · Thu, 28 Jun 2018 17:24:42 +0200

On 28.06.2018 09:15, David Hildenbrand wrote:
> On 27.06.2018 15:55, Janosch Frank wrote:
> 
> Wonder if this patch should rather be "KVM: s390: ..."
> 
>> From: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx>
>>
>> Now that we have everything in place, let's allow huge (1m) pmds for
>> gmap linking, effectively allowing hugetlbfs backed guests. Transparent
>> huge pages and 2g huge pages are *not* supported through this change.
>>
>> General KVM huge page support on s390 has to be enabled via the
>> kvm.hpage module parameter. Either nested or hpage can be enabled, as
>> we currently do not support vSIE for huge backed guests.
> 
> You might want to extend a little bit on how this might be handled in
> the future like "Once vSIE supports huge (1mb) pages, we can either drop
> the module parameter or enable it as default".
> 
>>
>> For a guest the feature has to be enabled through the new
>> KVM_CAP_S390_HPAGE capability and the hpage module parameter. Enabling
>> it will disable cmm, pfmf and storage key interpretation.
> 
> You might want to add how this would complicates things and why we
> disable them (no pgste on huge pages, but we might have to create fake
> 4k page tables for huge pages, and we have to tell the HW that these
> pgste are not to be used -> disable all assists that could make use of
> the PGSTE and force it into the manual handler instead). CMM in general
> not compatible.
> 
>>
>> Signed-off-by: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx>
>> ---
>>  Documentation/virtual/kvm/api.txt   | 16 +++++++++++++++
>>  arch/s390/include/asm/mmu.h         |  2 ++
>>  arch/s390/include/asm/mmu_context.h |  1 +
>>  arch/s390/kvm/kvm-s390.c            | 39 +++++++++++++++++++++++++++++++++++--
>>  arch/s390/mm/gmap.c                 |  8 +++++---
>>  include/uapi/linux/kvm.h            |  1 +
>>  6 files changed, 62 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index d10944e619d3..284acacea8c4 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4391,6 +4391,22 @@ all such vmexits.
>>  
>>  Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
>>  
>> +7.14 KVM_CAP_S390_HPAGE
>> +
>> +Architectures: s390
>> +Parameters: none
>> +Returns: 0 on success, -EINVAL if hpage module parameter was not set
>> +	 or cmma is enabled
>> +
>> +With this capability the KVM support for memory backing with 1m pages
>> +through hugetlbfs can be enabled for a VM. This will disable cmma,
> 
> It will never disable CMMA, as CMMA was not enabled before.

"After the capability is enabled, cmma can't be enabled anymore and
pfmfi and the storage key interpretation are disabled."

> 
>> +pfmfi and the storage key interpretation. If cmma has already been
>> +enabled or the hpage module parameter is not set to 1, -EINVAL is
>> +returned.
>> +
>> +While it is generally possible to create and start a huge page backed
>> +VM without this capability, the VM will not be functional.
> 
> will crash when trying to use a huge page?

Crash implies it ran at a certain point and that it'll burn down
horribly, but QEMU will only pause it after the EFAULT if I remember it
correctly.

"While it is generally possible to create a huge page backed
VM without this capability, the VM will not be able to run."

> 
>> +
>>  8. Other capabilities.
>>  ----------------------
>>  
>> diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
>> index f5ff9dbad8ac..652cd658e242 100644
>> --- a/arch/s390/include/asm/mmu.h
>> +++ b/arch/s390/include/asm/mmu.h
>> @@ -24,6 +24,8 @@ typedef struct {
>>  	unsigned int uses_skeys:1;
>>  	/* The mmu context uses CMM. */
>>  	unsigned int uses_cmm:1;
>> +	/* The gmap associated with this context uses huge pages. */
>> +	unsigned int uses_gmap_hpage:1;
> 
> uses implies "that we've seen a huge page", right? Shouldn't this be
> rather allow_gmap_hpage? (or to be more precise allow_gmap_1mb_hpage)

allow_gmap_hpage_1m

We might want to prefix all of them with kvm_ in the future, so it's
clear they belong to kvm.

> 
>>  } mm_context_t;
>>  
>>  #define INIT_MM_CONTEXT(name)						   \
>> diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
>> index d16bc79c30bb..407165d805b9 100644
>> --- a/arch/s390/include/asm/mmu_context.h
>> +++ b/arch/s390/include/asm/mmu_context.h
>> @@ -32,6 +32,7 @@ static inline int init_new_context(struct task_struct *tsk,
>>  	mm->context.has_pgste = 0;
>>  	mm->context.uses_skeys = 0;
>>  	mm->context.uses_cmm = 0;
>> +	mm->context.uses_gmap_hpage = 0;
>>  #endif
>>  	switch (mm->context.asce_limit) {
>>  	case _REGION2_SIZE:
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index de8dec849b60..41d4bfa31be1 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -171,7 +171,10 @@ struct kvm_s390_tod_clock_ext {
>>  static int nested;
>>  module_param(nested, int, S_IRUGO);
>>  MODULE_PARM_DESC(nested, "Nested virtualization support");
>> -
>> +/* allow 1m huge page guest backing, if !nested */
>> +static int hpage;
>> +module_param(hpage, int, 0444);
> 
> What's the latest trend, 0444 vs S_IRUGO?

checkpatch wants octals

> 
>> +MODULE_PARM_DESC(hpage, "1m huge page backing support");
>>  
>>  /*
>>   * For now we handle at most 16 double words as this is what the s390 base
>> @@ -475,6 +478,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>  	case KVM_CAP_S390_AIS_MIGRATION:
>>  		r = 1;
>>  		break;
>> +	case KVM_CAP_S390_HPAGE:
>> +		r = 0;
>> +		if (hpage)
>> +			r = 1;
>> +		break;
>>  	case KVM_CAP_S390_MEM_OP:
>>  		r = MEM_OP_MAX_SIZE;
>>  		break;
>> @@ -671,6 +679,24 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>>  		VM_EVENT(kvm, 3, "ENABLE: CAP_S390_GS %s",
>>  			 r ? "(not available)" : "(success)");
>>  		break;
>> +	case KVM_CAP_S390_HPAGE:
>> +		mutex_lock(&kvm->lock);
>> +		if (kvm->created_vcpus)
>> +			r = -EBUSY;
>> +		else if (!hpage || kvm->arch.use_cmma)
> 
> /* CMM relies on PGSTE to work, however huge pages have no PGSTE */
> 
> You should also check for "kvm->mm->context.uses_gmap_hpage" when
> enabling CMMA using the capability.

You mean the mem ctrl? I already do that, or do I miss something here?

-		if (!kvm->created_vcpus) {
+		if (kvm->created_vcpus)
+			ret = -EBUSY;
+		else if (kvm->mm->context.uses_gmap_hpage)
+			ret = -EINVAL;
+		else {
 			kvm->arch.use_cmma = 1;

> 
>> +			r = -EINVAL;
>> +		else {
>> +			r = 0;
>> +			kvm->mm->context.uses_gmap_hpage = 1;
>> +			/* They would complicate matters too much. */
> 
> Dito, please describe why this is complicated (or even impossible)
> 
> /* we might have to create fake 4k page tables and have to hinder the HW
>    from accessing these ...

                        /*

                         * We might have to create fake 4k page

                         * tables. To avoid that the hardware works on

                         * them we emulate these instructions.

                         */

> 
>> +			kvm->arch.use_skf = 0;
>> +			kvm->arch.use_cmma = 0;
> 
> Not necessary, you have an kvm->arch.use_cmma check above
> 
>> +			kvm->arch.use_pfmfi = 0;
>> +		}
>> +		mutex_unlock(&kvm->lock);
>> +		VM_EVENT(kvm, 3, "ENABLE: CAP_S390_HPAGE %s",
>> +			 r ? "(not available)" : "(success)");
>> +		break;
>>  	case KVM_CAP_S390_USER_STSI:
>>  		VM_EVENT(kvm, 3, "%s", "ENABLE: CAP_S390_USER_STSI");
>>  		kvm->arch.user_stsi = 1;
>> @@ -721,7 +747,11 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
>>  		ret = -EBUSY;
>>  		VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support");
>>  		mutex_lock(&kvm->lock);
>> -		if (!kvm->created_vcpus) {
>> +		if (kvm->created_vcpus)
>> +			ret = -EBUSY;
>> +		else if (kvm->mm->context.uses_gmap_hpage)
>> +			ret = -EINVAL;
>> +		else {
>>  			kvm->arch.use_cmma = 1;
>>  			/* Not compatible with cmma. */
>>  			kvm->arch.use_pfmfi = 0;
>> @@ -4086,6 +4116,11 @@ static int __init kvm_s390_init(void)
>>  		return -ENODEV;
>>  	}
>>  
>> +	if (nested && hpage) {
>> +		pr_info("nested (vSIE) and hpage (huge page backing) can currently not be activated concurrently");
>> +		return -EINVAL;
>> +	}
>> +
>>  	for (i = 0; i < 16; i++)
>>  		kvm_s390_fac_base[i] |=
>>  			S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i);
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index c691b9d9d223..84c6bad06949 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -2,8 +2,10 @@
>>  /*
>>   *  KVM guest address space mapping code
>>   *
>> - *    Copyright IBM Corp. 2007, 2016
>> + *    Copyright IBM Corp. 2007, 2016, 2017
>>   *    Author(s): Martin Schwidefsky <schwidefsky@xxxxxxxxxx>
>> + *		 David Hildenbrand <david@xxxxxxxxxx>
>> + *		 Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx>
>>   */
>>  
>>  #include <linux/kernel.h>
>> @@ -595,8 +597,8 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
>>  		return -EFAULT;
>>  	pmd = pmd_offset(pud, vmaddr);
>>  	VM_BUG_ON(pmd_none(*pmd));
>> -	/* large pmds cannot yet be handled */
>> -	if (pmd_large(*pmd))
>> +	/* Are we allowed to use huge pages? */
>> +	if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage)
>>  		return -EFAULT;
>>  	/* Link gmap segment table entry location to page table. */
>>  	rc = radix_tree_preload(GFP_KERNEL);
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index b6270a3b38e9..ea007e900acd 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_GET_MSR_FEATURES 153
>>  #define KVM_CAP_HYPERV_EVENTFD 154
>>  #define KVM_CAP_HYPERV_TLBFLUSH 155
>> +#define KVM_CAP_S390_HPAGE 156
> 
> Should we even change that to KVM_CAP_S390_1MB_HPAGE ?

I could live with KVM_CAP_S390_HPAGE_1M

> 
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>>
> 
> 

Attachment:
signature.asc

Description: OpenPGP digital signature