Re: [PATCH 2/4] KVM: x86/xen: Compatibility fixes for shared runstate area

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 19, 2022, David Woodhouse wrote:
> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
> 
> The guest runstate area can be arbitrarily byte-aligned. In fact, even
> when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
> fields in the structure end up being unaligned due to the fact that the
> 32-bit ABI only aligns them to 32 bits.
> 
> So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
> is buggy, because if it's unaligned then we can't update the whole field
> atomically; the low bytes might be observable before the _UPDATE bit is.
> Xen actually updates the *byte* containing that top bit, on its own. KVM
> should do the same.

I think we're using the wrong APIs to update the runstate.  The VMCS/VMCB pages
_need_ the host pfn, i.e. need to use a gpc (eventually).  The Xen PV stuff on the
other hand most definitely doesn't need to know the pfn.

The event channel code would be difficult to convert due to lack of uaccess
primitives, but I don't see anything in the runstate code that prevents KVM from
using a gfn_to_hva_cache.  That will naturally handle page splits by sending them
down a slow path and would yield far simpler code.

If taking the slow path is an issue, then the guest really should be fixed to not
split pages.  And if that's not an acceptable answer, the gfn_to_hva_cache code
could be updated to use the fast path if the region is contiguous in the host
virtual address space.

> +static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
>  {
>  	struct kvm_vcpu_xen *vx = &v->arch.xen;
> -	u64 now = get_kvmclock_ns(v->kvm);
> -	u64 delta_ns = now - vx->runstate_entry_time;
> -	u64 run_delay = current->sched_info.run_delay;
> +	struct gfn_to_pfn_cache *gpc1 = &vx->runstate_cache;
> +	struct gfn_to_pfn_cache *gpc2 = &vx->runstate2_cache;
> +	size_t user_len, user_len1, user_len2;
> +	struct vcpu_runstate_info rs;
> +	int *rs_state = &rs.state;
> +	unsigned long flags;
> +	size_t times_ofs;
> +	u8 *update_bit;
>  
> -	if (unlikely(!vx->runstate_entry_time))
> -		vx->current_runstate = RUNSTATE_offline;
> +	/*
> +	 * The only difference between 32-bit and 64-bit versions of the
> +	 * runstate struct us the alignment of uint64_t in 32-bit, which

s/us/is

> +	 * means that the 64-bit version has an additional 4 bytes of
> +	 * padding after the first field 'state'.
> +	 */
> +	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != 0);
> +	BUILD_BUG_ON(offsetof(struct compat_vcpu_runstate_info, state) != 0);
> +	BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c);
> +#ifdef CONFIG_X86_64
> +	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) !=
> +		     offsetof(struct compat_vcpu_runstate_info, state_entry_time) + 4);
> +	BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, time) !=
> +		     offsetof(struct compat_vcpu_runstate_info, time) + 4);
> +#endif
> +
> +	if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode) {
> +		user_len = sizeof(struct vcpu_runstate_info);
> +		times_ofs = offsetof(struct vcpu_runstate_info,
> +				     state_entry_time);
> +	} else {
> +		user_len = sizeof(struct compat_vcpu_runstate_info);
> +		times_ofs = offsetof(struct compat_vcpu_runstate_info,
> +				     state_entry_time);
> +		rs_state++;

...

> +	*rs_state = vx->current_runstate;
> +#ifdef CONFIG_X86_64
> +	/* Don't leak kernel memory through the padding in the 64-bit struct */
> +	if (rs_state == &rs.state)
> +		rs_state[1] = 0;

Oof, that's difficult to follow.  Rather than pointer magic, what about zeroing
the first word unconditionally?  Likely faster than a CMP+CMOV or whatever gets
generated.  Or just zero the entire struct.

	/* Blah blah blah */
	*(unsigned long *)&rs.state = 0;

> +#endif



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux