Re: [PATCH 1/4] KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support

Fabiano Rosas <farosas@xxxxxxxxxxxxx> · Wed, 20 Jan 2021 11:27:57 -0300

Nicholas Piggin <npiggin@xxxxxxxxx> writes:

> Excerpts from Fabiano Rosas's message of January 20, 2021 7:07 am:
>> Nicholas Piggin <npiggin@xxxxxxxxx> writes:
>> 
>>> Excerpts from Fabiano Rosas's message of January 19, 2021 11:46 am:
>>>> Resending because the previous got spam-filtered:
>>>> 
>>>> Nicholas Piggin <npiggin@xxxxxxxxx> writes:
>>>> 
>>>>> This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT
>>>>> guests on POWER9 radix hosts"), which was required to run HPT guests on
>>>>> RPT hosts on early POWER9 CPUs without support for "mixed mode", which
>>>>> meant the host could not run with MMU on while guests were running.
>>>>>
>>>>> This code has some corner case bugs, e.g., when the guest hits a machine
>>>>> check or HMI the primary locks up waiting for secondaries to switch LPCR
>>>>> to host, which they never do. This could all be fixed in software, but
>>>>> most CPUs in production have mixed mode support, and those that don't
>>>>> are believed to be all in installations that don't use this capability.
>>>>> So simplify things and remove support.
>>>> 
>>>> With this patch in a DD2.1 machine + indep_threads_mode=N +
>>>> disable_radix, QEMU aborts and dumps registers, is that intended?
>>>
>>> Yes. That configuration is hanging handling MCEs in the guest with some 
>>> threads waiting forever to synchronize. Paul suggested it was never a
>>> supported configuration so we might just remove it.
>>>
>> 
>> OK, so:
>> 
>> Tested-by: Fabiano Rosas <farosas@xxxxxxxxxxxxx>
>> 
>>>> Could we use the 'no_mixing_hpt_and_radix' logic in check_extension to
>>>> advertise only KVM_CAP_PPC_MMU_RADIX to the guest via OV5 so it doesn't
>>>> try to run hash?
>>>> 
>>>> For instance, if I hack QEMU's 'spapr_dt_ov5_platform_support' from
>>>> OV5_MMU_BOTH to OV5_MMU_RADIX_300 then it boots succesfuly, but the
>>>> guest turns into radix, due to this code in prom_init:
>>>> 
>>>> prom_parse_mmu_model:
>>>> 
>>>> case OV5_FEAT(OV5_MMU_RADIX): /* Only Radix */
>>>> 	prom_debug("MMU - radix only\n");
>>>> 	if (prom_radix_disable) {
>>>> 		/*
>>>> 		 * If we __have__ to do radix, we're better off ignoring
>>>> 		 * the command line rather than not booting.
>>>> 		 */
>>>> 		prom_printf("WARNING: Ignoring cmdline option disable_radix\n");
>>>> 	}
>>>> 	support->radix_mmu = true;
>>>> 	break;
>>>> 
>>>> It seems we could explicitly say that the host does not support hash and
>>>> that would align with the above code.
>>>
>>> I'm not sure, sounds like you could, on the other hand these aborts seem 
>>> like the prefered failure mode for these kinds of configuration issues, 
>>> I don't know what the policy is, is reverting back to radix acceptable?
>>>
>> 
>> Yeah, I couldn't find documentation about why we're reverting back to
>> radix. I personally dislike it, but there is already a precedent so I'm
>> not sure. A radix guest on a hash host does the same transparent
>> conversion AFAICT.
>> 
>> But despite that, this patch removes support for hash MMU in this
>> particular scenario. I don't see why continuing to tell the guest we
>> support hash.
>> 
>> Anyway, here's a patch if you decide to go that way (tested w/ DD2.1 &
>> 2.3 machines):
>
> Thanks, I don't mind it, have to see if the maintainer will take it :)
>
> You could add a small changelog / SOB and I could putit after my patch?
>

Sure, I'll reply to this thread with a proper patch.

>> 
>> ---
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index 0a056c64c317b..53743555676d6 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -314,6 +314,7 @@ struct kvmppc_ops {
>>  			      int size);
>>  	int (*enable_svm)(struct kvm *kvm);
>>  	int (*svm_off)(struct kvm *kvm);
>> +	bool (*hash_possible)(void);
>>  };
>>  
>>  extern struct kvmppc_ops *kvmppc_hv_ops;
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 6f612d240392f..2d1e8aba22b85 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -5599,6 +5599,15 @@ static int kvmhv_svm_off(struct kvm *kvm)
>>  	return ret;
>>  }
>>  
>> +static bool kvmppc_hash_possible(void)
>> +{
>> +	if (radix_enabled() && no_mixing_hpt_and_radix)
>> +		return false;
>> +
>> +	return cpu_has_feature(CPU_FTR_ARCH_300) &&
>> +		cpu_has_feature(CPU_FTR_HVMODE);
>> +}
>
> Just be careful, it's hash_v3 specifically. Either make this return true 
> for arch < 300 add the ARCH_300 check in the ioctl, or rename to include
> v3.
>
>> +
>>  static struct kvmppc_ops kvm_ops_hv = {
>>  	.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
>>  	.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
>> @@ -5642,6 +5651,7 @@ static struct kvmppc_ops kvm_ops_hv = {
>>  	.store_to_eaddr = kvmhv_store_to_eaddr,
>>  	.enable_svm = kvmhv_enable_svm,
>>  	.svm_off = kvmhv_svm_off,
>> +	.hash_possible = kvmppc_hash_possible,
>>  };
>>  
>
> How about adding an op which can check extensions? It could return false
> if it wasn't checked and so default to the generic checks in 
> kvm_vm_ioctl_check_extension, and take a pointer to 'r' to set.
>

I'm not sure I get the part about "return false if it wasn't
checked". Do you mean like this?

static bool kvmhv_check_extension(long ext, int *r)
{
	switch (ext) {
	case KVM_CAP_PPC_MMU_HASH_V3:
		r = kvmppc_hash_v3_possible();
		break;
	default:
		return false;
	}
	return true;
}

And then we could move all of the #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
cases into it and early in kvm_vm_ioctl_check_extension have something
like:

if (hv_enabled && kvmppc_hv_ops->check_extension(ext, &r))
	return r;

>>  static int kvm_init_subcore_bitmap(void)
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index cf52d26f49cd7..99ced6c570e74 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -611,8 +611,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>  		r = !!(hv_enabled && radix_enabled());
>>  		break;
>>  	case KVM_CAP_PPC_MMU_HASH_V3:
>> -		r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300) &&
>> -		       cpu_has_feature(CPU_FTR_HVMODE));
>> +		r = !!(hv_enabled && kvmppc_hv_ops->hash_possible());
>>  		break;
>>  	case KVM_CAP_PPC_NESTED_HV:
>>  		r = !!(hv_enabled && kvmppc_hv_ops->enable_nested &&
>
> Thanks,
> Nick