Re: [PATCH 03/11] kvm: x86: Add fast CR3 switch code path

Liran Alon <liran.alon@xxxxxxxxxx> · Mon, 14 May 2018 14:55:47 -0700 (PDT)

----- junaids@xxxxxxxxxx wrote:

> On 05/13/2018 02:49 PM, Liran Alon wrote:
> >>>
> >>> For now, this fast path is implemented only for 64-bit guests and
> >>> hosts,
> >>> but it can be extended later to 32-bit guests and/or hosts as
> well.
> >>
> >> It is unclear to me from the commit message and patch itself why
> this
> >> decision was made.
> >> It is best if you clarify in commit message and comments what is
> the
> >> complexity of 32-bit guests/hosts.
> >>
> 
> Ok. I'll try to add some comments regarding it. The additional
> complexity
> would be handling of the PDPTEs. Shouldn't be too difficult to do. I
> just
> skipped it based on the assumption that 32-bit platforms wouldn't be
> that
> common nowadays.

ACK. BTW you could also ask who runs KVM with Intel VT-x but without EPT today? :P
I personally have encountered this SMMU PCID issue because I had a hypervisor
which needed to run on machines without Intel VT-x support at all (binary-translation based).

> 
> >>>  
> >>> +#define KVM_MMU_ROOT_CURRENT	BIT(0)
> >>> +#define KVM_MMU_ROOT_PREVIOUS	BIT(1)
> >>> +#define KVM_MMU_ROOTS_ALL	(KVM_MMU_ROOT_CURRENT |
> >>> KVM_MMU_ROOT_PREVIOUS)
> >>> +
> >>
> >> I think it is a bit weird that this patch introduce "uint
> >> roots_to_free" as flags
> >> while it is really used as a bool: Either you free both prev &
> active
> >> root or you free just active root.
> >> Therefore it makes more sense at this point for parameter to be
> "bool
> >> free_prev_root".
> 
> You are correct that a bool free_prev_root is sufficient as far as
> this
> patch alone is concerned. That is actually how I wrote it initially.
> However, it would not be sufficient for patch 10, which does need to
> selectively free the prev_root only without freeing the current root.

I agree but this is not how patches should be made in my opinion.
I think every patch should make sense in it's own.
It makes sense that at this patch it will be a bool and that a later patch will then modify this
to flags when needed.

> 
> >> Another weird effect of this is what does it mean to call
> >> kvm_mmu_free_roots() with roots_to_free=0?
> >> Interface seems a bit strange.
> >>
> 
> Calling with root_to_free=0 would just be a no-op.

I understood that. :)
I just mentioned it is a bit weird interface.

> 
> >>> +void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, uint
> roots_to_free)
> >>>  {
> >>>  	int i;
> >>>  	LIST_HEAD(invalid_list);
> >>>  	struct kvm_mmu *mmu = &vcpu->arch.mmu;
> >>> +	bool free_active_root = roots_to_free & KVM_MMU_ROOT_CURRENT;
> >>> +	bool free_prev_root = roots_to_free & KVM_MMU_ROOT_PREVIOUS;
> >>>  
> >>> -	if (!VALID_PAGE(mmu->root_hpa))
> >>> +	if ((!VALID_PAGE(mmu->root_hpa) || !free_active_root) &&
> >>> +	    (!VALID_PAGE(mmu->prev_root_hpa) || !free_prev_root))
> >>>  		return;
> >>
> >> Shouldn't the condition instead be:
> >> if ((free_active_root && !VALID_PAGE(mmu->root_hpa)) ||
> >>     (free_prev_root && !VALID_PAGE(mmu->prev_root_hpa))
> >>     return;
> >> ?
> 
> No, it is acceptable for the caller to call this without checking
> that
> the root that is supposed to be freed already exists or not. So if,
> for
> instance, free_active_root is true but root_hpa is not valid, we
> should
> not just return if free_prev_root is also true and the prev_root_hpa
> is
> valid.

ACK. You are right. Thanks for clarifying this.

> 
> >>
> >> In general, it seems that all the diff regarding fast_cr3_switch()
> and
> >> saving mmu.prev_cr3
> >> is unrelated to the optimization described in the commit message:
> >> Optimization of the common case where CR3 was changed while no
> guest
> >> page table changes happened in between.
> >> For this optimization, we only need the change regarding
> >> mmu_sync_roots() to not grab mmu_lock when
> >> (!root->unsync && !root->unsync_children).
> >> Am I missing something?
> > 
> > I still think that this should be split to 2 separate patches.
> > The first only optimizing mmu_sync_roots() to not grab mmu_lock().
> > The second optimizing to not free previous root SPT
> > (as we are probably about to return to it because of how KPTI
> works).
> > 
> 
> Ok. I can split out the mmu_sync_roots() change into a separate patch.
> But
> please note that it alone doesn't make the CR3 switch lock-free. The
> second
> patch would then be the actual fast CR3 switch patch, including the
> fast_cr3_switch() function and the code to not free the previous
> root.

You are correct.
I just think smaller patches are easier to review & bisect.

> 
> Thanks,
> Junaid