Re: [PATCH 03/16] KVM: x86: set gfn-to-pfn cache length consistently with VM word size

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 27 Oct 2022 17:22:51 +0000

On Thu, Oct 27, 2022, Paolo Bonzini wrote:
> So, use the short size at activation time as well.  This means
> re-activating the cache if the guest requests the hypercall page
> multiple times with different word sizes (this can happen when
> kexec-ing, for example).

I don't understand the motivation for allowing a conditionally valid GPA.  I see
a lot of complexity and sub-optimal error handling for a use case that no one
cares about.  Realistically, userspace is never going to provide a GPA that only
works some of the time, because doing otherwise is just asking for a dead guest.

> +static int kvm_xen_reactivate_runstate_gpcs(struct kvm *kvm)
> +{
> +	struct kvm_vcpu *vcpu;
> +	unsigned long i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (vcpu->arch.xen.runstate_cache.active) {

This is not safe when called from kvm_xen_write_hypercall_page(), which doesn't
acquire kvm->lock and thus doesn't guard against a concurrent call via
kvm_xen_vcpu_set_attr().  That's likely a bug in itself, but even if that issue
is fixed, I don't see how this is yields a better uAPI than forcing userspace to
provide an address that is valid for all modes.

If the address becomes bad when the guest changes the hypercall page, the guest
is completely hosed through no fault of its own, whereas limiting the misaligned
detection to KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR means that any "bad" address
will result in immediate failure, i.e. makes it so that errors are 100% userspace
misconfiguration bugs.

> +			int r = kvm_xen_activate_runstate_gpc(vcpu,
> +					vcpu->arch.xen.runstate_cache.gpa);
> +			if (r < 0)

Returning immediately is wrong, as later vCPUs will have a valid, active cache
that hasn't been verified for 64-bit mode.

> +				return r;
> +		}
> +	}
> +	return 0;
> +}
> +
>  void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state)
>  {
>  	struct kvm_vcpu_xen *vx = &v->arch.xen;
> @@ -212,11 +243,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state)
>  	if (!vx->runstate_cache.active)
>  		return;
>  
> -	if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode)
> -		user_len = sizeof(struct vcpu_runstate_info);
> -	else
> -		user_len = sizeof(struct compat_vcpu_runstate_info);
> -
> +	user_len = kvm_xen_runstate_info_size(v->kvm);
>  	read_lock_irqsave(&gpc->lock, flags);
>  	while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa,
>  					   user_len)) {
> @@ -461,7 +488,7 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  			mutex_lock(&kvm->lock);
>  			kvm->arch.xen.long_mode = !!data->u.long_mode;
>  			mutex_unlock(&kvm->lock);
> -			r = 0;
> +			r = kvm_xen_reactivate_runstate_gpcs(kvm);

Needs to be called under kvm->lock.  This path also needs to acquire kvm->srcu.

>  		}
>  		break;
>  
> @@ -596,9 +623,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
>  			break;
>  		}
>  
> -		r = kvm_gpc_activate(vcpu->kvm, &vcpu->arch.xen.runstate_cache,
> -				     NULL, KVM_HOST_USES_PFN, data->u.gpa,
> -				     sizeof(struct vcpu_runstate_info));
> +		r = kvm_xen_activate_runstate_gpc(vcpu, data->u.gpa);
>  		break;
>  
>  	case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT:
> @@ -843,9 +868,13 @@ int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
>  	u32 page_num = data & ~PAGE_MASK;
>  	u64 page_addr = data & PAGE_MASK;
>  	bool lm = is_long_mode(vcpu);
> +	int r;
>  
>  	/* Latch long_mode for shared_info pages etc. */
>  	vcpu->kvm->arch.xen.long_mode = lm;
> +	r = kvm_xen_reactivate_runstate_gpcs(kvm);
> +	if (r < 0)
> +		return 1;

Aren't we just making up behavior at this point?  Injecting a #GP into the guest
for what is a completely legal operation from the guest's perspective seems all
kinds of wrong.