Il ven 24 gen 2025, 21:11 Sean Christopherson <seanjc@xxxxxxxxxx> ha scritto: > Heh, except it's all kinds of broken. Yes, I didn't even try. > IMO, biting the bullet and converting to > an SRCU-protected list is going to be far less work in the long run. I did try a long SRCU critical section and it was unreviewable. It ends up a lot less manageable than just making the lock a leaf, especially as the lock hierarchy spans multiple subsystems (static key, KVM, cpufreq---thanks CPU hotplug lock...). I also didn't like adding a synchronization primitive that's... kinda single-use, but that would not be a blocker of course. So the second attempt was regular RCU, which looked a lot like this patch. I started writing all the dances to find a struct kvm that passes kvm_get_kvm_safe() before you do rcu_read_unlock() and drop the previous one (because you cannot do kvm_put_kvm() within the RCU read side) and set aside the idea, incorrectly thinking that they were not needed with kvm_lock. Plus I didn't like having to keep alive a bunch of data for a whole grace period if call_rcu() is used. So for the third attempt I could have chosen between dropping the SRCU or just using kvm_lock. I didn't even think of SRCU to be honest, because everything so far looked so bad, but it does seem a little better than RCU. At least, if kvm_destroy_vm() uses call_srcu(), you can call kvm_put_kvm() within srcu_read_lock()...srcu_read_unlock(). It would look something like list_for_each_entry_srcu(kvm, &vm_list, vm_list, 1) { if (!kvm_get_kvm_safe(kvm)) continue; /* kvm is protected by the reference count now. */ srcu_read_unlock(&kvm_srcu); ... srcu_read_lock(&kvm_srcu); /* kvm stays alive, and next can be read, until the next srcu_read_unlock() */ kvm_put_kvm(kvm); } srcu_read_unlock(&kvm_srcu); but again I am not sure how speedy call_srcu() is in reclaiming the data, even in the common case where set_nx_huge_pages() or any other RCU reader (none of them is frequent) isn't running. If you don't use call_srcu() it becomes just as bad as RCU or kvm_lock. So... let's talk about kvm_lock. > > @@ -7143,16 +7141,19 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp) > > + kvm_get_kvm(kvm); > > This needs to be: > > if (!kvm_get_kvm_safe(kvm)) > continue; If we go for kvm_lock, kvm_get_kvm() *can* be made safe within the critical section; if kvm_put_kvm() uses refcount_dec_and_mutex_lock(), then the 1->0 transition happens under kvm_lock and cannot race with kvm_get_kvm() (the mutex can be dropped as soon as refcount_dec_and_mutex_lock() returns, it's really just the decrement that needs to be within the critical section). > > if (new_val != old_val) { > > struct kvm *kvm; > > > > - mutex_lock(&kvm_lock); > > - > > list_for_each_entry(kvm, &vm_list, vm_list) { > > This is unsafe, as vm_list can be modified while kvm_lock is dropped. And > using list_for_each_entry_safe() doesn't help, because the _next_ entry have been > freed. list_for_each_entry_safe() is broken, but list_for_each_entry() can be used. The problem is the call to kvm_put_kvm(), which is where the kvm struct is freed thus breaking the foreach loop. So next must be read and ref'd _before_ kvm_put_kvm(), then you can do kvm_get_kvm(kvm); mutex_unlock(&kvm_lock); if (prev) kvm_put_kvm(prev); ... mutex_lock(&kvm_lock); prev = kvm; I don't know... there are few-enough readers that SRCU seems a bit misplaced and it has the issue of keeping the VM data alive; while kvm_lock has uglier code with the kvm_put_kvm() looking really misplaced. If there were many instances one could write a nice iterator, but for just one use? Hmm... I wonder if something like if (poll_state_synchronize_srcu(&kvm_srcu, get_state_synchronize_srcu(&kvm_srcu))) { kvm_destroy_vm_cb(&kvm->rcu_head); } else { call_srcu(&kvm_srcu, &kvm->rcu_head, kvm_destroy_vm_cb); } catches the case where there's no concurrent reader. If so, SRCU would be a winner undoubtedly, but being the only user of a tricky RCU API doesn't give me warm and fuzzy feelings. I'm still team kvm_lock for now. Anyhow I can prepare a tested version next Monday, with either kvm_lock or with SRCU if the above trick works. Unless I showed that it's trickier than it seems and successfully nerd-sniped you. Seriously - just tell me what you prefer. Paolo