Re: [PATCH 03/11] kvm: x86: Add fast CR3 switch code path

Junaid Shahid <junaids@xxxxxxxxxx> · Mon, 14 May 2018 13:58:35 -0700

On 05/13/2018 02:49 PM, Liran Alon wrote:
>>>
>>> For now, this fast path is implemented only for 64-bit guests and
>>> hosts,
>>> but it can be extended later to 32-bit guests and/or hosts as well.
>>
>> It is unclear to me from the commit message and patch itself why this
>> decision was made.
>> It is best if you clarify in commit message and comments what is the
>> complexity of 32-bit guests/hosts.
>>

Ok. I'll try to add some comments regarding it. The additional complexity
would be handling of the PDPTEs. Shouldn't be too difficult to do. I just
skipped it based on the assumption that 32-bit platforms wouldn't be that
common nowadays.

>>>  
>>> +#define KVM_MMU_ROOT_CURRENT	BIT(0)
>>> +#define KVM_MMU_ROOT_PREVIOUS	BIT(1)
>>> +#define KVM_MMU_ROOTS_ALL	(KVM_MMU_ROOT_CURRENT |
>>> KVM_MMU_ROOT_PREVIOUS)
>>> +
>>
>> I think it is a bit weird that this patch introduce "uint
>> roots_to_free" as flags
>> while it is really used as a bool: Either you free both prev & active
>> root or you free just active root.
>> Therefore it makes more sense at this point for parameter to be "bool
>> free_prev_root".

You are correct that a bool free_prev_root is sufficient as far as this
patch alone is concerned. That is actually how I wrote it initially.
However, it would not be sufficient for patch 10, which does need to
selectively free the prev_root only without freeing the current root.

>> Another weird effect of this is what does it mean to call
>> kvm_mmu_free_roots() with roots_to_free=0?
>> Interface seems a bit strange.
>>

Calling with root_to_free=0 would just be a no-op.

>>> +void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, uint roots_to_free)
>>>  {
>>>  	int i;
>>>  	LIST_HEAD(invalid_list);
>>>  	struct kvm_mmu *mmu = &vcpu->arch.mmu;
>>> +	bool free_active_root = roots_to_free & KVM_MMU_ROOT_CURRENT;
>>> +	bool free_prev_root = roots_to_free & KVM_MMU_ROOT_PREVIOUS;
>>>  
>>> -	if (!VALID_PAGE(mmu->root_hpa))
>>> +	if ((!VALID_PAGE(mmu->root_hpa) || !free_active_root) &&
>>> +	    (!VALID_PAGE(mmu->prev_root_hpa) || !free_prev_root))
>>>  		return;
>>
>> Shouldn't the condition instead be:
>> if ((free_active_root && !VALID_PAGE(mmu->root_hpa)) ||
>>     (free_prev_root && !VALID_PAGE(mmu->prev_root_hpa))
>>     return;
>> ?

No, it is acceptable for the caller to call this without checking that
the root that is supposed to be freed already exists or not. So if, for
instance, free_active_root is true but root_hpa is not valid, we should
not just return if free_prev_root is also true and the prev_root_hpa is
valid.

>>
>> In general, it seems that all the diff regarding fast_cr3_switch() and
>> saving mmu.prev_cr3
>> is unrelated to the optimization described in the commit message:
>> Optimization of the common case where CR3 was changed while no guest
>> page table changes happened in between.
>> For this optimization, we only need the change regarding
>> mmu_sync_roots() to not grab mmu_lock when
>> (!root->unsync && !root->unsync_children).
>> Am I missing something?
> 
> I still think that this should be split to 2 separate patches.
> The first only optimizing mmu_sync_roots() to not grab mmu_lock().
> The second optimizing to not free previous root SPT
> (as we are probably about to return to it because of how KPTI works).
> 

Ok. I can split out the mmu_sync_roots() change into a separate patch. But
please note that it alone doesn't make the CR3 switch lock-free. The second
patch would then be the actual fast CR3 switch patch, including the
fast_cr3_switch() function and the code to not free the previous root.

Thanks,
Junaid