@all, trim your replies! On Tue, Jan 03, 2023, Vipin Sharma wrote: > On Tue, Jan 3, 2023 at 10:01 AM Vipin Sharma <vipinsh@xxxxxxxxxx> wrote: > > > > On Thu, Dec 29, 2022 at 1:55 PM David Matlack <dmatlack@xxxxxxxxxx> wrote: > > > > @@ -6646,66 +6690,49 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) > > > > static unsigned long > > > > mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > > > > { > > > > - struct kvm *kvm; > > > > - int nr_to_scan = sc->nr_to_scan; > > > > + struct kvm_mmu_memory_cache *cache; > > > > + struct kvm *kvm, *first_kvm = NULL; > > > > unsigned long freed = 0; > > > > + /* spinlock for memory cache */ > > > > + spinlock_t *cache_lock; > > > > + struct kvm_vcpu *vcpu; > > > > + unsigned long i; > > > > > > > > mutex_lock(&kvm_lock); > > > > > > > > list_for_each_entry(kvm, &vm_list, vm_list) { > > > > - int idx; > > > > - LIST_HEAD(invalid_list); > > > > - > > > > - /* > > > > - * Never scan more than sc->nr_to_scan VM instances. > > > > - * Will not hit this condition practically since we do not try > > > > - * to shrink more than one VM and it is very unlikely to see > > > > - * !n_used_mmu_pages so many times. > > > > - */ > > > > - if (!nr_to_scan--) > > > > + if (first_kvm == kvm) > > > > break; > > > > - /* > > > > - * n_used_mmu_pages is accessed without holding kvm->mmu_lock > > > > - * here. We may skip a VM instance errorneosly, but we do not > > > > - * want to shrink a VM that only started to populate its MMU > > > > - * anyway. > > > > - */ > > > > - if (!kvm->arch.n_used_mmu_pages && > > > > - !kvm_has_zapped_obsolete_pages(kvm)) > > > > - continue; > > > > + if (!first_kvm) > > > > + first_kvm = kvm; > > > > + list_move_tail(&kvm->vm_list, &vm_list); > > > > > > > > - idx = srcu_read_lock(&kvm->srcu); > > > > - write_lock(&kvm->mmu_lock); > > > > + kvm_for_each_vcpu(i, vcpu, kvm) { > > > > > > What protects this from racing with vCPU creation/deletion? > > > > > vCPU deletion: > We take kvm_lock in mmu_shrink_scan(), the same lock is taken in > kvm_destroy_vm() to remove a vm from vm_list. So, once we are > iterating vm_list we will not see any VM removal which will means no > vcpu removal. > > I didn't find any other code for vCPU deletion except failures during > VM and VCPU set up. A VM is only added to vm_list after successful > creation. Yep, KVM doesn't support destroying/freeing a vCPU after it's been added. > vCPU creation: > I think it will work. > > kvm_vm_ioctl_create_vcpus() initializes the vcpu, adds it to > kvm->vcpu_array which is of the type xarray and is managed by RCU. > After this online_vcpus is incremented. So, kvm_for_each_vcpu() which > uses RCU to read entries, if it sees incremented online_vcpus value > then it will also sees all of the vcpu initialization. Yep. The shrinker may race with a vCPU creation, e.g. not process a just-created vCPU, but that's totally ok in this case since the shrinker path is best effort (and purging the caches of a newly created vCPU is likely pointless). > @Sean, Paolo > > Is the above explanation correct, kvm_for_each_vcpu() is safe without any lock? Well, in this case, you do need to hold kvm_lock ;-) But yes, iterating over vCPUs without holding the per-VM kvm->lock is safe, the caller just needs to ensure the VM can't be destroyed, i.e. either needs to hold a reference to the VM or needs to hold kvm_lock.