On Sun, Dec 15, 2019 at 12:33:02PM -0500, Peter Xu wrote: > On Thu, Dec 12, 2019 at 01:08:14AM +0100, Paolo Bonzini wrote: > > >>> What depends on what here? Looks suspicious ... > > >> > > >> Hmm, I think maybe it can be removed because the entry pointer > > >> reference below should be an ordering constraint already? > > > > entry->xxx depends on ring->reset_index. > > Yes that's true, but... > > entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)]; > /* barrier? */ > next_slot = READ_ONCE(entry->slot); > next_offset = READ_ONCE(entry->offset); > > ... I think entry->xxx depends on entry first, then entry depends on > reset_index. So it seems fine because all things have a dependency? Is reset_index changed from another thread then? If yes then you want to read reset_index with READ_ONCE. That includes a dependency barrier. > > > > >>> what's the story around locking here? Why is it safe > > >>> not to take the lock sometimes? > > >> > > >> kvm_dirty_ring_push() will be with lock==true only when the per-vm > > >> ring is used. For per-vcpu ring, because that will only happen with > > >> the vcpu context, then we don't need locks (so kvm_dirty_ring_push() > > >> is called with lock==false). > > > > FWIW this will be done much more nicely in v2. > > > > >>>> + page = alloc_page(GFP_KERNEL | __GFP_ZERO); > > >>>> + if (!page) { > > >>>> + r = -ENOMEM; > > >>>> + goto out_err_alloc_page; > > >>>> + } > > >>>> + kvm->vm_run = page_address(page); > > >>> > > >>> So 4K with just 8 bytes used. Not as bad as 1/2Mbyte for the ring but > > >>> still. What is wrong with just a pointer and calling put_user? > > >> > > >> I want to make it the start point for sharing fields between > > >> user/kernel per-vm. Just like kvm_run for per-vcpu. > > > > This page is actually not needed at all. Userspace can just map at > > KVM_DIRTY_LOG_PAGE_OFFSET, the indices reside there. You can drop > > kvm_vm_run completely. > > I changed it because otherwise we use one entry of the padding, and > all the rest of paddings are a waste of memory because we can never > really use the padding as new fields only for the 1st entry which > overlaps with the indices. IMHO that could even waste more than 4k. > > (for now we only "waste" 4K for per-vm, kvm_run is already mapped so > no waste there, not to say potentially I still think we can use the > kvm_vm_run in the future) > > > > > >>>> + } else { > > >>>> + /* > > >>>> + * Put onto per vm ring because no vcpu context. Kick > > >>>> + * vcpu0 if ring is full. > > >>> > > >>> What about tasks on vcpu 0? Do guests realize it's a bad idea to put > > >>> critical tasks there, they will be penalized disproportionally? > > >> > > >> Reasonable question. So far we can't avoid it because vcpu exit is > > >> the event mechanism to say "hey please collect dirty bits". Maybe > > >> someway is better than this, but I'll need to rethink all these > > >> over... > > > > > > Maybe signal an eventfd, and let userspace worry about deciding what to > > > do. > > > > This has to be done synchronously. But the vm ring should be used very > > rarely (it's for things like kvmclock updates that write to guest memory > > outside a vCPU), possibly a handful of times in the whole run of the VM. > > I've summarized a list of callers who might dirty guest memory in the > other thread, it seems to me that even the kvm clock is using per-vcpu > contexts. > > > > > >>> KVM_DIRTY_RING_MAX_ENTRIES is not part of UAPI. > > >>> So how does userspace know what's legal? > > >>> Do you expect it to just try? > > >> > > >> Yep that's what I thought. :) > > > > We should return it for KVM_CHECK_EXTENSION. > > OK. I'll drop the versioning. > > -- > Peter Xu