Re: [PATCH v10 4/6] arm: KVM: Add initial dirty page locking infrastructure

wanghaibin <wanghaibin.wang@xxxxxxxxxx> · Thu, 4 Sep 2014 11:29:23 +0800

On 2014/9/4 7:01, Mario Smarduch wrote:

> On 09/02/2014 07:42 PM, wanghaibin wrote:
>> On 2014/8/27 8:04, Mario Smarduch wrote:
>>
>>> Patch adds support for initial write protection of VM memlsot. This patch series
>>> assumes that huge PUDs will not be used in 2nd stage tables, which is awlays
>>> valid on ARMv7.
>>>
>>> Signed-off-by: Mario Smarduch <m.smarduch@xxxxxxxxxxx>
>>> ---
>>>  arch/arm/include/asm/kvm_host.h       |   1 +
>>>  arch/arm/include/asm/kvm_mmu.h        |  20 ++++++
>>>  arch/arm/include/asm/pgtable-3level.h |   1 +
>>>  arch/arm/kvm/arm.c                    |   9 +++
>>>  arch/arm/kvm/mmu.c                    | 128 ++++++++++++++++++++++++++++++++++
>>>  5 files changed, 159 insertions(+)
>>>
> [...]
>>> +
>>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>> +{
>>> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>>> +}
>>> +
>>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>> +{
>>> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>>> +}
>>> +
>>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>>  #define kvm_pgd_addr_end(addr, end)					\
>>>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
>>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>>> index 85c60ad..d8bb40b 100644
>>> --- a/arch/arm/include/asm/pgtable-3level.h
>>> +++ b/arch/arm/include/asm/pgtable-3level.h
>>> @@ -129,6 +129,7 @@
>>>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>>  
>>
>>
>> Hi, Mario,  the L_PMD_S2_RDONLY defined, **pteval_t** type should be *pmdval_t* .
> 
> Yes it should thanks for catching it.
>>
>>> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>
>>>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>>  
>>>  /*
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index c52b2bd..e1be6c7 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>>  				   const struct kvm_memory_slot *old,
>>>  				   enum kvm_mr_change change)
>>>  {
>>> +#ifdef CONFIG_ARM
>>
>>
>> About CONFIG_ARM , I suggest cancle this limmit..
>> I just implement a few interfaces for ARM64 , and tested this patch using VNC, it's work.
>>
> Yes I was planing to do that once ARMv7 and generic/arch split for
> dirty log and tlb flush functions are settled. I realize at this time 
> ARMv8 just needs it's own tlb flush and it's done. But I need a strong 
> test case before claiming ARMv8 support
> 
> I have some ideas on testing it but migration is probably most
> reliable and the only use case really.
> 
> Can you elaborate on your test environment for ARMv8 (you could
> email off-line if too long, also thanks for testing!). 

Oh, I just create some device (keyboard, mouse, clcd, panel) and add some corresponding dtb node
in mach-virt(qemu).
ps: now, kernel haven't upstream the newest CLCD patch-set, so, we need pull these patches to guest.

If you need, I will send this patch to you!

> 
> 
> 
>>> +	/*
>>> +	 * At this point memslot has been committed and there is an
>>> +	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
>>> +	 * memory slot is write protected.
>>> +	 */
>>> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>>> +#endif
>>>  }
>>>  
> [...]
>>> +/**
>>> +  * stage2_wp_pud_range - write protect PUD range
>>> +  * @kvm:	pointer to kvm structure
>>> +  * @pud:	pointer to pgd entry
>>> +  * @addr:	range start address
>>> +  * @end:	range end address
>>> +  *
>>> +  * While walking the PUD range huge PUD pages are ignored.
>>> +  */
>>> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
>>> +					phys_addr_t addr, phys_addr_t end)
>>> +{
>>> +	pud_t *pud;
>>> +	phys_addr_t next;
>>> +
>>> +	pud = pud_offset(pgd, addr);
>>> +	do {
>>> +		next = kvm_pud_addr_end(addr, end);
>>
>>
>> There may be a problem.
>> I tested the patch set using VNC, the video ram which qemu emulated for CLCD, will be regist to KVM with
>> the log dirty attribution. 
> 
> 'log dirty attribution' what do you mean by that?
> 
>> It is worth to mention that the S2 page for video ram is not created at that time.
>> So, there must be BUG_ON(kvm_pud_huge(*pud));
> 
> 'not created at that time'? What time? 
> 
> The time when memory region is registered should have no 
> bearing on logging after it's committed logging
> will start, if its log flag is set.
> 
>>
>> I think this code should be :
>>
>> 		if (!pud_none(*pud)) {    /* check the *pud valid first */
>> 			BUG_ON(kvm_pud_huge(*pud));
>> 			stage2_wp_pmd_range(pud, addr, next);
>> 		}
> 
> I may be missing something here but not sure how the update improves
> anything? Definitely follow-up if I'm off on this.
> 
> Thanks.

OK, I mean that:
	In qemu CLCD video ram emulated, here is the code:

 	    memory_region_init_ram(vram, NULL, "virt.vram", size);
    	    vmstate_register_ram_global(vram);
    	    memory_region_add_subregion(sysmem, base, vram);
    	    memory_region_set_log(vram, true, DIRTY_MEMORY_VGA);  /* mem->flag dirty log*/

	Then, qemu will ioctl(KVM_SET_USER_MEMORY_REGION), with the mem->flag attribution(dirty log).
	and after, there must call kvm_mmu_wp_memory_region to change this memslot S2 page
	write protect.

	I mean this time, the S2 is not created, That is the *pud == 0x0 .

So, maybe this problem is that:
	the first: we should check the *pud is valid/none or not.
	the second: check the *pud is huge or not.

Thanks .

>>
>>
>> Thanks !
>>
>>> +		/* TODO: PUD not supported, revisit later if implemented  */
>>> +		BUG_ON(kvm_pud_huge(*pud));
>>> +		if (!pud_none(*pud))
>>> +			stage2_wp_pmd_range(pud, addr, next);
>>> +	} while (pud++, addr = next, addr != end);
>>> +}
>>> +
>>> +/**
>>> + * stage2_wp_range() - write protect stage2 memory region range
>>> + * @kvm:	The KVM pointer
>>> + * @start:	Start address of range
>>> + * &end:	End address of range
>>> + */
>>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>>> +{
>>> +	pgd_t *pgd;
>>> +	phys_addr_t next;
>>> +
>>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>>> +	do {
>>> +		/*
>>> +		 * Release kvm_mmu_lock periodically if the memory region is
>>> +		 * large. Otherwise, we may see kernel panics with
>>> +		 * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCK_DETECTOR,
>>> +		 * CONFIG_LOCK_DEP. Additionally, holding the lock too long
>>> +		 * will also starve other vCPUs.
>>> +		 */
>>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>>> +			cond_resched_lock(&kvm->mmu_lock);
>>> +
>>> +		next = kvm_pgd_addr_end(addr, end);
>>> +		if (pgd_present(*pgd))
>>> +			stage2_wp_pud_range(kvm, pgd, addr, next);
>>> +	} while (pgd++, addr = next, addr != end);
>>> +}
>>> +
>>> +/**
>>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
>>> + * @kvm:	The KVM pointer
>>> + * @slot:	The memory slot to write protect
>>> + *
>>> + * Called to start logging dirty pages after memory region
>>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>>> + * all present PMD and PTEs are write protected in the memory region.
>>> + * Afterwards read of dirty page log can be called.
>>> + *
>>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>>> + * serializing operations for VM memory regions.
>>> + */
>>> +
>>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>>> +{
>>> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>>> +	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
>>> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>>> +
>>> +	spin_lock(&kvm->mmu_lock);
>>> +	stage2_wp_range(kvm, start, end);
>>> +	kvm_flush_remote_tlbs(kvm);
>>> +	spin_unlock(&kvm->mmu_lock);
>>> +}
>>> +#endif
>>> +
>>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>  			  struct kvm_memory_slot *memslot,
>>>  			  unsigned long fault_status)
>>
>>
>>
> 
> 
> .
> 

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm