Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 29, 2013 at 07:26:40PM +0800, Xiao Guangrong wrote:
> On 08/29/2013 05:51 PM, Gleb Natapov wrote:
> > On Thu, Aug 29, 2013 at 05:31:42PM +0800, Xiao Guangrong wrote:
> >>> As Documentation/RCU/whatisRCU.txt says:
> >>>
> >>>         As with rcu_assign_pointer(), an important function of
> >>>         rcu_dereference() is to document which pointers are protected by
> >>>         RCU, in particular, flagging a pointer that is subject to changing
> >>>         at any time, including immediately after the rcu_dereference().
> >>>         And, again like rcu_assign_pointer(), rcu_dereference() is
> >>>         typically used indirectly, via the _rcu list-manipulation
> >>>         primitives, such as list_for_each_entry_rcu().
> >>>
> >>> The documentation aspect of rcu_assign_pointer()/rcu_dereference() is
> >>> important. The code is complicated, so self documentation will not hurt.
> >>> I want to see what is actually protected by rcu here. Freeing shadow
> >>> pages with call_rcu() further complicates matters: does it mean that
> >>> shadow pages are also protected by rcu? 
> >>
> >> Yes, it stops shadow page to be freed when we do write-protection on
> >> it.
> >>
> > Yeah, I got the trick, what I am saying that we have a data structure
> > here protected by RCU, but we do not use RCU functions to access it...
> 
> Yes, they are not used when insert a spte into rmap and get the rmap from
> the entry... but do we need to use these functions to guarantee the order?
> 
> The worst case is, we fetch the spte from the desc but the spte is not
> updated yet, we can happily skip this spte since it will set the
> dirty-bitmap later, this is guaranteed by the barrier between mmu_spte_update()
> and mark_page_dirty(), the code is:
> 
> set_spte():
> 
> 	if (mmu_spte_update(sptep, spte))
> 		kvm_flush_remote_tlbs(vcpu->kvm);
> 
> 	if (!remap) {
> 		if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD)
> 			rmap_recycle(vcpu, sptep, gfn);
> 
> 		if (level > PT_PAGE_TABLE_LEVEL)
> 			++vcpu->kvm->stat.lpages;
> 	}
> 
> 	smp_wmb();
> 
> 	if (pte_access & ACC_WRITE_MASK)
> 		mark_page_dirty(vcpu->kvm, gfn);
> 
> So, i guess if we can guaranteed the order by ourself, we do not need
> to call the rcu functions explicitly...
> 
> But, the memory barres in the rcu functions are really light on x86 (store
> can not be reordered with store), so i do not mind to explicitly use them
> if you think this way is more safe. :)
> 
I think the self documentation aspect of using rcu function is also
important.

> > BTW why not allocate sp->spt from SLAB_DESTROY_BY_RCU cache too? We may
> > switch write protection on a random spt occasionally if page is deleted
> > and reused for another spt though. For last level spt it should not be a
> > problem and for non last level we have is_last_spte() check in
> > __rmap_write_protect_lockless(). Can it work?
> 
> Yes, i also considered this way. It can work if we handle is_last_spte()
> properly. Since the sp->spte can be reused, we can not get the mapping
> level from sp. We need to encode the mapping level into spte so that
> cmpxhg can understand if the page table has been moved to another mapping
> level.
Isn't one bit that says that spte is the last one enough? IIRC we
have one more ignored bit to spare in spte.

>         Could you allow me to make this optimization separately after this
> patchset be merged?
> 
If you think it will complicate the initial version I am fine with
postponing it for later.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux