Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes

Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxxxxxx> · Tue, 15 Apr 2014 19:11:24 +0800

Hi Marcelo,

Thanks your time to review it.

On 04/09/2014 10:51 PM, Marcelo Tosatti wrote:

>> +/*
>> + * Please note PT_WRITABLE_MASK is not stable since
>> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
>> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
>> + *    can write protect sptes but flush tlb out mmu-lock that means we may use
>> + *    the corrupt tlb entries which depend on this bit.
>> + *
>> + * Both cases do not modify the status of  spte_is_locklessly_modifiable() so
>> + * if you want to check whether the spte is writable on MMU you can check
>> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
>> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
>> + * instead. See the comments in spte_has_volatile_bits() and
>> + * mmu_spte_update().
>> + */
>>  static inline int is_writable_pte(unsigned long pte)
>>  {
> 
> Xiao,
> 
> Can't get the SPTE_MMU_WRITEABLE part.
> 
> So assume you are writing code to perform some action after guest memory
> has been write protected. You would
> 
> spin_lock(mmu_lock);
> 
> if (writeable spte bit is set)
>     remove writeable spte bit from spte
> remote TLB flush            (*)
> action
> 
> spin_unlock(mmu_lock);
> 
> (*) is necessary because reading the writeable spte bit as zero
> does not guarantee remote TLBs have been flushed.
> 
> Now what SPTE_MMU_WRITEABLE has to do with it ?

It is contained in "remove writeable spte bit from spte" which
is done by mmu_spte_update():

	if (spte_is_locklessly_modifiable(old_spte) &&
	      !is_writable_pte(new_spte))
		ret = true;
> 
> Perhaps a recipe like that (or just the rules) would be useful.

Okay, i am considering to improve this comments, how about like
this:

Currently, we have two sorts of write-protection, a) the first
one write-protects guest page to sync the guest modification,
b) another one is used to sync dirty bitmap when we do
KVM_GET_DIRTY_LOG. The differences between these two sorts are:
1) the first case clears SPTE_MMU_WRITEABLE bit.
2) the first case requires flushing tlb immediately avoiding
   corrupting shadow page table between all vcpus so it should
   be in the protection of mmu-lock. And the another case does
   not need to flush tlb until returning the dirty bitmap to
   userspace since it only write-protects the page logged in
   the bitmap, that means the page in the dirty bitmap is not
   missed, so it can flush tlb out of mmu-lock.

So, there is the problem: the first case can meet the corrupted
tlb caused by another case which write-protects pages but without
flush tlb immediately. In order to making the first case be aware
this problem we let it flush tlb if we try to write-protect
a spte whose SPTE_MMU_WRITEABLE bit is set, it works since another
case never touches SPTE_MMU_WRITEABLE bit.

Anyway, whenever a spte is updated (only permission and status bits
are changed) we need to check whether the spte with SPTE_MMU_WRITEABLE
becomes readonly, if that happens, we need to flush tlb. Fortunately,
mmu_spte_update() has already handled it perfectly.

The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK:
- if we want to see if it has writable tlb entry or if the spte can
  be writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is
  the most case, otherwise
- if we fix page fault on the spte or do write-protection by dirty logging,
  check PT_WRITABLE_MASK.

TODO: introduce APIs to split these two cases.

> 
> The remaining patches look good.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html