On 04/24/2012 09:37 AM, Xiao Guangrong wrote: > On 04/24/2012 12:16 AM, Avi Kivity wrote: > > > Using RCU for lockless shadow walking can increase the amount of memory > > in use by the system, since RCU grace periods are unpredictable. We also > > have an unconditional write to a shared variable (reader_counter), which > > isn't good for scaling. > > > > Replace that with a scheme similar to x86's get_user_pages_fast(): disable > > interrupts during lockless shadow walk to force the freer > > (kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the > > processor with interrupts enabled. > > > > We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent > > kvm_flush_remote_tlbs() from avoiding the IPI. > > > > Signed-off-by: Avi Kivity <avi@xxxxxxxxxx> > > --- > > > > Turned out to be simpler than expected. However, I think there's a problem > > with make_all_cpus_request() possible reading an incorrect vcpu->cpu. > > > It seems possible. > > Can we fix it by reading vcpu->cpu when the vcpu is in GUEST_MODE or > EXITING_GUEST_MODE (IIRC, in these modes, interrupt is disabled)? > > Like: > > if (kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE) > cpumask_set_cpu(vcpu->cpu, cpus); I think it is actually okay. We are only vulnerable if lockless shadow walk started during prepare_zap_page(), and extends past kvm_flush_remote_tlbs(), yes? But in that case, vcpu->cpu is stable since local_irq_disable() kills preemption. > > > > static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu) > > { > > - /* Decrease the counter after walking shadow page table finished */ > > - smp_mb__before_atomic_dec(); > > - atomic_dec(&vcpu->kvm->arch.reader_counter); > > - rcu_read_unlock(); > > > We need a mb here to avoid that setting vcpu->mode is reordered to the head > of reading/writing spte? (it is safe on x86, but we need a comment at least?) I don't think so. Documentation/memory-barriers says: Any atomic operation that modifies some state in memory and returns information about the state (old or new) implies an SMP-conditional general memory barrier (smp_mb()) on each side of the actual operation (with the exception of explicit lock operations, described later). These include: xchg(); cmpxchg(); atomic_cmpxchg(); atomic_inc_return(); atomic_dec_return(); atomic_add_return(); atomic_sub_return(); atomic_inc_and_test(); atomic_dec_and_test(); atomic_sub_and_test(); atomic_add_negative(); atomic_add_unless(); /* when succeeds (returns 1) */ test_and_set_bit(); test_and_clear_bit(); test_and_change_bit(); -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html