On Wed, May 22, 2013 at 11:25:13PM +0800, Xiao Guangrong wrote: > On 05/22/2013 09:17 PM, Gleb Natapov wrote: > > On Wed, May 22, 2013 at 05:41:10PM +0800, Xiao Guangrong wrote: > >> On 05/22/2013 04:54 PM, Gleb Natapov wrote: > >>> On Wed, May 22, 2013 at 04:46:04PM +0800, Xiao Guangrong wrote: > >>>> On 05/22/2013 02:34 PM, Gleb Natapov wrote: > >>>>> On Tue, May 21, 2013 at 10:33:30PM -0300, Marcelo Tosatti wrote: > >>>>>> On Tue, May 21, 2013 at 11:39:03AM +0300, Gleb Natapov wrote: > >>>>>>>> Any pages with stale information will be zapped by kvm_mmu_zap_all(). > >>>>>>>> When that happens, page faults will take place which will automatically > >>>>>>>> use the new generation number. > >>>>>>>> > >>>>>>>> So still not clear why is this necessary. > >>>>>>>> > >>>>>>> This is not, strictly speaking, necessary, but it is the sane thing to do. > >>>>>>> You cannot update page's generation number to prevent it from been > >>>>>>> destroyed since after kvm_mmu_zap_all() completes stale ptes in the > >>>>>>> shadow page may point to now deleted memslot. So why build shadow page > >>>>>>> table with a page that is in a process of been destroyed? > >>>>>> > >>>>>> OK, can this be introduced separately, in a later patch, with separate > >>>>>> justification, then? > >>>>>> > >>>>>> Xiao please have the first patches of the patchset focus on the problem > >>>>>> at hand: fix long mmu_lock hold times. > >>>>>> > >>>>>>> Not sure what you mean again. We flush TLB once before entering this function. > >>>>>>> kvm_reload_remote_mmus() does this for us, no? > >>>>>> > >>>>>> kvm_reload_remote_mmus() is used as an optimization, its separate from the > >>>>>> problem solution. > >>>>>> > >>>>>>>> > >>>>>>>> What was suggested was... go to phrase which starts with "The only purpose > >>>>>>>> of the generation number should be to". > >>>>>>>> > >>>>>>>> The comment quoted here does not match that description. > >>>>>>>> > >>>>>>> The comment describes what code does and in this it is correct. > >>>>>>> > >>>>>>> You propose to not reload roots right away and do it only when root sp > >>>>>>> is encountered, right? So my question is what's the point? There are, > >>>>>>> obviously, root sps with invalid generation number at this point, so > >>>>>>> reload will happen regardless in kvm_mmu_prepare_zap_page(). So why not > >>>>>>> do it here right away and avoid it in kvm_mmu_prepare_zap_page() for > >>>>>>> invalid and obsolete sps as I proposed in one of my email? > >>>>>> > >>>>>> Sure. But Xiao please introduce that TLB collapsing optimization as a > >>>>>> later patch, so we can reason about it in a more organized fashion. > >>>>> > >>>>> So, if I understand correctly, you are asking to move is_obsolete_sp() > >>>>> check from kvm_mmu_get_page() and kvm_reload_remote_mmus() from > >>>>> kvm_mmu_invalidate_all_pages() to a separate patch. Fine by me, but if > >>>>> we drop kvm_reload_remote_mmus() from kvm_mmu_invalidate_all_pages() the > >>>>> call to kvm_mmu_invalidate_all_pages() in emulator_fix_hypercall() will > >>>>> become nop. But I question the need to zap all shadow pages tables there > >>>>> in the first place, why kvm_flush_remote_tlbs() is not enough? > >>>> > >>>> I do not know too... I even do no know why kvm_flush_remote_tlbs > >>>> is needed. :( > >>> We changed the content of an executable page, we need to flush instruction > >>> cache of all vcpus to not use stale data, so my suggestion to call > >> > >> I thought the reason is about icache too but icache is automatically > >> flushed on x86, we only need to invalidate the prefetched instructions by > >> executing a serializing operation. > >> > >> See the SDM in the chapter of > >> "8.1.3 Handling Self- and Cross-Modifying Code" > >> > > Right, so we do cross-modifying code here and we need to make sure no > > vcpu is running in a guest mode while this happens, but > > kvm_mmu_zap_all() does not provide this guaranty since vcpus will > > continue running after reloading roots! > > May be we can introduce a function to atomic write gpa, then the guest > either 1) see the old value, in that case, it can be intercepted or > 2) see the the new value in that case, it can continue to execute. > SDM says atomic write is not enough. All vcpu should be guarantied to not execute code in the vicinity of modified code. This is easy to achieve though: vcpu0: lock(x); make_all_cpus_request(EXIT); unlock(x); vcpuX: if (kvm_check_request(EXIT)) { lock(x); unlock(x); } > >>> kvm_flush_remote_tlbs() is obviously incorrect since this flushes tlb, > >>> not instruction cache, but why kvm_reload_remote_mmus() would flush > >>> instruction cache? > >> > >> kvm_reload_remote_mmus do not have any help i think. > >> > >> I find that this change is introduced by commit: 7aa81cc0 > >> and I have added Anthony in the CC. > >> > >> I also find some discussions related to calling > >> kvm_reload_remote_mmus(): > >> > >>> > >>> But if the instruction is architecture dependent, and you run on the > >>> wrong architecture, now you have to patch many locations at fault time, > >>> introducing some nasty runtime code / data cache overlap performance > >>> problems. Granted, they go away eventually. > >>> > >> > >> We're addressing that by blowing away the shadow cache and holding the > >> big kvm lock to ensure SMP safety. Not a great thing to do from a > >> performance perspective but the whole point of patching is that the cost > >> is amortized. > >> > >> (http://kerneltrap.org/mailarchive/linux-kernel/2007/9/14/260288) > >> > >> But i can not understand... > > Back then kvm->lock protected memslot access so code like: > > > > mutex_lock(&vcpu->kvm->lock); > > kvm_mmu_zap_all(vcpu->kvm); > > mutex_unlock(&vcpu->kvm->lock); > > > > which is what 7aa81cc0 does was enough to guaranty that no vcpu will > > run while code is patched. > > So, at that time, kvm->lock is also held when #PF is being fixed? > It was, and also during kvm_mmu_load() which is called during vcpu entry after roots are zapped. > > This is no longer the case and > > mutex_lock(&vcpu->kvm->lock); is gone from that code path long time ago, > > so now kvm_mmu_zap_all() there is useless and the code is incorrect. > > > > Lets drop kvm_mmu_zap_all() there (in separate patch) and fix the > > patching properly later. > > Will do. > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html