On 28/04/21 18:40, Ben Gardon wrote:
On Tue, Apr 27, 2021 at 11:25 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
On 28/04/21 00:36, Ben Gardon wrote:
+void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
+ struct kvm_memslots *slots)
+{
+ mutex_lock(&kvm->arch.memslot_assignment_lock);
+ rcu_assign_pointer(kvm->memslots[as_id], slots);
+ mutex_unlock(&kvm->arch.memslot_assignment_lock);
+}
Does the assignment also needs the lock, or only the rmap allocation? I
would prefer the hook to be something like kvm_arch_setup_new_memslots.
The assignment does need to be under the lock to prevent the following race:
1. Thread 1 (installing a new memslot): Acquires memslot assignment
lock (or perhaps in this case rmap_allocation_lock would be more apt.)
2. Thread 1: Check alloc_memslot_rmaps (it is false)
3. Thread 1: doesn't allocate memslot rmaps for new slot
4. Thread 1: Releases memslot assignment lock
5. Thread 2 (allocating a shadow root): Acquires memslot assignment lock
6. Thread 2: Sets alloc_memslot_rmaps = true
7. Thread 2: Allocates rmaps for all existing slots
8. Thread 2: Releases memslot assignment lock
9. Thread 2: Sets shadow_mmu_active = true
10. Thread 1: Installs the new memslots
11. Thread 3: Null pointer dereference when trying to access rmaps on
the new slot.
... because thread 3 would be under mmu_lock and therefore cannot
allocate the rmap itself (you have to do it in mmu_alloc_shadow_roots,
as in patch 6).
Related to this, your solution does not have to protect kvm_dup_memslots
with the new lock, because the first update of the memslots will not go
through kvm_arch_prepare_memory_region but it _will_ go through
install_new_memslots and therefore through the new hook. But overall I
think I'd prefer to have a kvm->slots_arch_lock mutex in generic code,
and place the call to kvm_dup_memslots and
kvm_arch_prepare_memory_region inside that mutex.
That makes the new lock decently intuitive, and easily documented as
"Architecture code can use slots_arch_lock if the contents of struct
kvm_arch_memory_slot needs to be written outside
kvm_arch_prepare_memory_region. Unlike slots_lock, slots_arch_lock can
be taken inside a ``kvm->srcu`` read-side critical section".
I admit I haven't thought about it very thoroughly, but if something
like this is enough, it is relatively pretty:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9b8e30dd5b9b..6e5106365597 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1333,6 +1333,7 @@ static struct kvm_memslots
*install_new_memslots(struct kvm *kvm,
rcu_assign_pointer(kvm->memslots[as_id], slots);
+ mutex_unlock(&kvm->slots_arch_lock);
synchronize_srcu_expedited(&kvm->srcu);
/*
@@ -1399,6 +1398,7 @@ static int kvm_set_memslot(struct kvm *kvm,
struct kvm_memslots *slots;
int r;
+ mutex_lock(&kvm->slots_arch_lock);
slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
if (!slots)
return -ENOMEM;
@@ -1427,6 +1427,7 @@ static int kvm_set_memslot(struct kvm *kvm,
* - kvm_is_visible_gfn (mmu_check_root)
*/
kvm_arch_flush_shadow_memslot(kvm, slot);
+ mutex_lock(&kvm->slots_arch_lock);
}
r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
It does make the critical section a bit larger, so that the
initialization of the shadow page (which is in KVM_RUN context) contends
with slightly more code than necessary. However it's all but a
performance critical situation, as it will only happen just once per VM.
WDYT?
Paolo
Putting the assignment under the lock prevents 5-8 from happening
between 2 and 10.
I'm open to other ideas as far as how to prevent this race though. I
admit this solution is not the most elegant looking.