Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run

Avi Kivity <avi@xxxxxxxxxx> · Mon, 18 May 2009 01:31:50 +0300

Christian Ehrhardt wrote:
The bad thing on vcpu->request in that case is that I don't want 
the async behaviour of vcpu->requests in that case, I want the 
memory slot updated in all vcpu's when the ioctl is returning.

You mean, the hardware can access the vcpu control block even when 
the vcpu is not running? 
No, hardware only uses it with a running vcpu, but I realised my own 
fault while changing the code to vcpu->request style.
For s390 I need to update the KVM->arch and *all* 
vcpu->arch->sie_block... data synchronously.

Out of interest, can you explain why?
Sure I'll try to give an example.

a) The whole guest has "one" memory slot representing all it's memory. 
Therefore some important values like guest_origin and guest_memsize 
(one slot so it's just addr+size) are kept at VM level in kvm->arch.

It should really be kept in kvm->memslots[0]->{userspace_addr, npages}.  
This is common to all architectures.

b) We fortunately have cool hardware support for "nearly 
everything"(tm) :-) In this case for example we set in 
vcpu->arch.sie_block the values for origin and size translated into a 
"limit" to get memory management virtualization support.

x86 has something analogous; shadow or nested page tables are also 
per-vcpu and accessed by the hardware while the guest is running.

c) we have other code e.g. all our copy_from/to_guest stuff that uses 
the kvm->arch values

You want to drop these and use kvm_read_guest() / kvm_write_guest().

If we would allow e.g. updates of a memslot (or as the patch supposes 
to harden the set_memory_region code against inconsiderate code 
changes in other sections) it might happen that we set the kvm->arch 
information but the vcpu->arch->sie_block stuff not until next 
reentry. Now concurrently the running vcpu could cause some kind of 
fault that involves a copy_from/to_guest. That way we could end up 
with potentially invalid handling of that fault (fault handling and 
running guest would use different userspace adresses until it is 
synced on next vcpu reentry) - it's theoretical I know, but it might 
cause some issues that would be hard to find.

I agree it should be protected.  Here's how we do it in arch-independent 
code:

- code that looks up memory slots takes slots_lock for read (future: rcu)
- code that changes memory slots takes slots_lock for write, and 
requests an mmu reload (includes an IPI to force the vcpu out of guest mode)

Now, once we begin changing a slot no one can touch memory or reenter 
the guest until we are done.

On the other hand for the long term I wanted to note that all our 
copy_from/to_guest functions is per vcpu, so when we some day 
implement updateable memslots, multiple memslots or even just fill 
"free time"(tm) and streamline our code we could redesign that 
origin/size storage. This could be done multiple ways, either just 
store it per vcpu or with a lock for the kvm->arch level variables - 
both ways and maybe more could then use the vcpu->request based 
approach, but unfortunately it's neither part of that patch nor of the 
current effort to do that.

I think we should keep that in generic code.  All of that applies to x86 
(and ia64 and ppc), if I understand you correctly, and if I understand 
the other archs correctly (don't place a large bet).

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html