Hi, On 11/05/18 12:43, Paolo Bonzini wrote: > On 11/05/2018 13:02, Andre Przywara wrote: >> Hi Paolo, >> >> thanks for the answer! >> Took me a bit, but I think you are right (see below). >> >> On 10/05/18 18:43, Paolo Bonzini wrote: >>> On 10/05/2018 19:41, Andre Przywara wrote: >>>> Hi, >>>> >>>> Jan posted an lockdep splat complaining about a suspicious >>>> rcu_dereference_check: >>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2018-May/031116.html >>>> >>>> The gist of that is: >>>> ... >>>> [ 1025.695517] dump_stack+0x9c/0xd4 >>>> [ 1025.695524] lockdep_rcu_suspicious+0xcc/0x118 >>>> [ 1025.695537] gfn_to_memslot+0x174/0x190 >>>> [ 1025.695546] kvm_read_guest+0x50/0xb0 >>>> [ 1025.695553] vgic_its_check_id.isra.0+0x114/0x148 >>>> ... >>>> I chased that down and wonder if kvm_read_guest{,_page} is supposed to >>>> be called inside a kvm->srcu critical section? >>>> >>>> We have a check that suggests that eventually someone needs to enter the >>>> SRCU criticial section: >>>> static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, >>>> int as_id) >>>> { >>>> return srcu_dereference_check(kvm->memslots[as_id], &kvm->srcu, >>>> lockdep_is_held(&kvm->slots_lock) || >>>> !refcount_read(&kvm->users_count)); >>>> } >>>> >>>> If I get this correctly this mean for accessing kvm->memslots we either >>>> need to be inside an srcu critical section or hold the kvm->slots_lock >>>> (for updates only). >>>> >>>> If I am not mistaken, it is not necessary for *callers* of >>>> kvm_read_guest_page() to do this, as this could be entirely contained >>>> inside this function - since we only use the reference to the memslot >>>> entry while doing the copy_from_user(), and the data is safe afterwards >>>> from an RCU point of view because it has been *copied*. >>> >>> Yes, it's the caller's responsibility. srcu_read_lock/unlock is pretty >>> expensive >> >> Is that so? I was under the impression that declaring RCU critical >> sections is very cheap, is that different with SRCU? > > Yes, because RCU effectively lets the scheduler do the expensive parts. > With SRCU you have to do them yourself with the advantage that: 1) you > can sleep during RCU critical sections; 2) synchronize_srcu is much > cheaper than synchronize_rcu and synchronize_sched. > > It is still relatively cheap, and it doesn't serialize against writers, > but the order of magnitude is 100 clock cycles for each of lock and > unlock. Compared with rcu_read_lock/unlock, which are nops on any > kernel but PREEMPT_RT, that counts as expensive. :) > >>> so KVM assumes that the topmost callers do it. >> >> OK, fair enough. And with some hints from Jörg I understand now that x86 >> and s390 do a "srcu_read_lock(&kvm->srcu);" right after leaving the > :> guest and unlock it only shortly before entering again, so that any >> intermediate calls are protected. That leaves the locking duty only up >> to calls originating from userspace. >> But AFAICT neither mips, powerpc or arm/arm64 are doing this. I am >> checking now whether this is an omission or whether they are really >> doing fine grained locking for all memslots accesses. > > Ok, let me Cc the maintainers. I suppose at least some of them do use > lockdep from time to time, but it is certainly possible that some cases > have been missed. Thanks for that. As far as I can see, mips seems to be safe, because they don't use kvm_read_guest() and the other memslot references seem to be properly protected. So James can enjoy this weekend ;-) For powerpc it's a bit more complex, I tried to chase down at least the four users of kvm_read_guest(): - The one in arch/powerpc/kvm/book3s_rtas.c is safe. - The two users in arch/powerpc/kvm/book3s_64_mmu_radix.c don't seem to take any locks, but are only called when creating the VM, so that's supposedly somewhat safe (?) - I couldn't find any protection for the usage in arch/powerpc/kvm/powerpc.c, but the call chain is quite convoluted there, so I might have missed something. It would be good if someone more familiar with this code would take a look. > Adding the srcu_read_lock/unlock directly in kvm_arch_vcpu_ioctl_run and > any other ioctls that need it is best, but in any case adding more pairs > is safe because they can be nested. So I added a small wrapper around kvm_read_guest(), which takes and drops the lock. Will send out the patch shortly. If powerpc needs it, I am happy to provide this wrapper in kvm_main.c instead of some arm header file instead. Cheers, Andre.