On Thu, Aug 29, 2013 at 07:26:40PM +0800, Xiao Guangrong wrote: > On 08/29/2013 05:51 PM, Gleb Natapov wrote: > > On Thu, Aug 29, 2013 at 05:31:42PM +0800, Xiao Guangrong wrote: > >>> As Documentation/RCU/whatisRCU.txt says: > >>> > >>> As with rcu_assign_pointer(), an important function of > >>> rcu_dereference() is to document which pointers are protected by > >>> RCU, in particular, flagging a pointer that is subject to changing > >>> at any time, including immediately after the rcu_dereference(). > >>> And, again like rcu_assign_pointer(), rcu_dereference() is > >>> typically used indirectly, via the _rcu list-manipulation > >>> primitives, such as list_for_each_entry_rcu(). > >>> > >>> The documentation aspect of rcu_assign_pointer()/rcu_dereference() is > >>> important. The code is complicated, so self documentation will not hurt. > >>> I want to see what is actually protected by rcu here. Freeing shadow > >>> pages with call_rcu() further complicates matters: does it mean that > >>> shadow pages are also protected by rcu? > >> > >> Yes, it stops shadow page to be freed when we do write-protection on > >> it. > >> > > Yeah, I got the trick, what I am saying that we have a data structure > > here protected by RCU, but we do not use RCU functions to access it... > > Yes, they are not used when insert a spte into rmap and get the rmap from > the entry... but do we need to use these functions to guarantee the order? > > The worst case is, we fetch the spte from the desc but the spte is not > updated yet, we can happily skip this spte since it will set the > dirty-bitmap later, this is guaranteed by the barrier between mmu_spte_update() > and mark_page_dirty(), the code is: > > set_spte(): > > if (mmu_spte_update(sptep, spte)) > kvm_flush_remote_tlbs(vcpu->kvm); > > if (!remap) { > if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD) > rmap_recycle(vcpu, sptep, gfn); > > if (level > PT_PAGE_TABLE_LEVEL) > ++vcpu->kvm->stat.lpages; > } > > smp_wmb(); > > if (pte_access & ACC_WRITE_MASK) > mark_page_dirty(vcpu->kvm, gfn); > > So, i guess if we can guaranteed the order by ourself, we do not need > to call the rcu functions explicitly... > > But, the memory barres in the rcu functions are really light on x86 (store > can not be reordered with store), so i do not mind to explicitly use them > if you think this way is more safe. :) > I think the self documentation aspect of using rcu function is also important. > > BTW why not allocate sp->spt from SLAB_DESTROY_BY_RCU cache too? We may > > switch write protection on a random spt occasionally if page is deleted > > and reused for another spt though. For last level spt it should not be a > > problem and for non last level we have is_last_spte() check in > > __rmap_write_protect_lockless(). Can it work? > > Yes, i also considered this way. It can work if we handle is_last_spte() > properly. Since the sp->spte can be reused, we can not get the mapping > level from sp. We need to encode the mapping level into spte so that > cmpxhg can understand if the page table has been moved to another mapping > level. Isn't one bit that says that spte is the last one enough? IIRC we have one more ignored bit to spare in spte. > Could you allow me to make this optimization separately after this > patchset be merged? > If you think it will complicate the initial version I am fine with postponing it for later. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html