Re: [PATCH v10 4/6] arm: KVM: Add initial dirty page locking infrastructure

Mario Smarduch <m.smarduch@xxxxxxxxxxx> · Thu, 04 Sep 2014 10:25:43 -0700

On 09/03/2014 08:29 PM, wanghaibin wrote:
> On 2014/9/4 7:01, Mario Smarduch wrote:
> 
>> On 09/02/2014 07:42 PM, wanghaibin wrote:
>>> On 2014/8/27 8:04, Mario Smarduch wrote:
>>>
>>>> Patch adds support for initial write protection of VM memlsot. This patch series
>>>> assumes that huge PUDs will not be used in 2nd stage tables, which is awlays
>>>> valid on ARMv7.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@xxxxxxxxxxx>
>>>> ---
>>>>  arch/arm/include/asm/kvm_host.h       |   1 +
>>>>  arch/arm/include/asm/kvm_mmu.h        |  20 ++++++
>>>>  arch/arm/include/asm/pgtable-3level.h |   1 +
>>>>  arch/arm/kvm/arm.c                    |   9 +++
>>>>  arch/arm/kvm/mmu.c                    | 128 ++++++++++++++++++++++++++++++++++
>>>>  5 files changed, 159 insertions(+)
>>>>
>> [...]
>>>> +
>>>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>>> +{
>>>> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>>>> +}
>>>> +
>>>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>>> +{
>>>> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>>>> +}
>>>> +
>>>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>>>  #define kvm_pgd_addr_end(addr, end)					\
>>>>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
>>>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>>>> index 85c60ad..d8bb40b 100644
>>>> --- a/arch/arm/include/asm/pgtable-3level.h
>>>> +++ b/arch/arm/include/asm/pgtable-3level.h
>>>> @@ -129,6 +129,7 @@
>>>>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>>>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>>>  
>>>
>>>
>>> Hi, Mario,  the L_PMD_S2_RDONLY defined, **pteval_t** type should be *pmdval_t* .
>>
>> Yes it should thanks for catching it.
>>>
>>>> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>>
>>>>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>>>  
>>>>  /*
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index c52b2bd..e1be6c7 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>>>  				   const struct kvm_memory_slot *old,
>>>>  				   enum kvm_mr_change change)
>>>>  {
>>>> +#ifdef CONFIG_ARM
>>>
>>>
>>> About CONFIG_ARM , I suggest cancle this limmit..
>>> I just implement a few interfaces for ARM64 , and tested this patch using VNC, it's work.
>>>
>> Yes I was planing to do that once ARMv7 and generic/arch split for
>> dirty log and tlb flush functions are settled. I realize at this time 
>> ARMv8 just needs it's own tlb flush and it's done. But I need a strong 
>> test case before claiming ARMv8 support
>>
>> I have some ideas on testing it but migration is probably most
>> reliable and the only use case really.
>>
>> Can you elaborate on your test environment for ARMv8 (you could
>> email off-line if too long, also thanks for testing!). 
> 
> 
> Oh, I just create some device (keyboard, mouse, clcd, panel) and add some corresponding dtb node
> in mach-virt(qemu).
> ps: now, kernel haven't upstream the newest CLCD patch-set, so, we need pull these patches to guest.
> 
> If you need, I will send this patch to you!

Yes please do.

> 
>>
>>
>>
>>>> +	/*
>>>> +	 * At this point memslot has been committed and there is an
>>>> +	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
>>>> +	 * memory slot is write protected.
>>>> +	 */
>>>> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>>> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>>>> +#endif
>>>>  }
>>>>  
>> [...]
>>>> +/**
>>>> +  * stage2_wp_pud_range - write protect PUD range
>>>> +  * @kvm:	pointer to kvm structure
>>>> +  * @pud:	pointer to pgd entry
>>>> +  * @addr:	range start address
>>>> +  * @end:	range end address
>>>> +  *
>>>> +  * While walking the PUD range huge PUD pages are ignored.
>>>> +  */
>>>> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
>>>> +					phys_addr_t addr, phys_addr_t end)
>>>> +{
>>>> +	pud_t *pud;
>>>> +	phys_addr_t next;
>>>> +
>>>> +	pud = pud_offset(pgd, addr);
>>>> +	do {
>>>> +		next = kvm_pud_addr_end(addr, end);
>>>
>>>
>>> There may be a problem.
>>> I tested the patch set using VNC, the video ram which qemu emulated for CLCD, will be regist to KVM with
>>> the log dirty attribution. 
>>
>> 'log dirty attribution' what do you mean by that?
>>
>>> It is worth to mention that the S2 page for video ram is not created at that time.
>>> So, there must be BUG_ON(kvm_pud_huge(*pud));
>>
>> 'not created at that time'? What time? 
>>
>> The time when memory region is registered should have no 
>> bearing on logging after it's committed logging
>> will start, if its log flag is set.
>>
>>>
>>> I think this code should be :
>>>
>>> 		if (!pud_none(*pud)) {    /* check the *pud valid first */
>>> 			BUG_ON(kvm_pud_huge(*pud));
>>> 			stage2_wp_pmd_range(pud, addr, next);
>>> 		}
>>
>> I may be missing something here but not sure how the update improves
>> anything? Definitely follow-up if I'm off on this.
>>
>> Thanks.
> 
> 
> OK, I mean that:
> 	In qemu CLCD video ram emulated, here is the code:
> 		
>  	    memory_region_init_ram(vram, NULL, "virt.vram", size);
>     	    vmstate_register_ram_global(vram);
>     	    memory_region_add_subregion(sysmem, base, vram);
>     	    memory_region_set_log(vram, true, DIRTY_MEMORY_VGA);  /* mem->flag dirty log*/
> 
> 	Then, qemu will ioctl(KVM_SET_USER_MEMORY_REGION), with the mem->flag attribution(dirty log).
> 	and after, there must call kvm_mmu_wp_memory_region to change this memslot S2 page
> 	write protect.
> 
> 	I mean this time, the S2 is not created, That is the *pud == 0x0 .
> 
> So, maybe this problem is that:
> 	the first: we should check the *pud is valid/none or not.
> 	the second: check the *pud is huge or not.
I see check for huge pud only if pud is not NULL, sounds good.

You also bring up a use case for dirty page logging (other then
migration), it appears QEMU VNC server uses it for screen updates
on client.

> 
> Thanks .
> 
>>>
>>>
>>> Thanks !
>>>
>>>> +		/* TODO: PUD not supported, revisit later if implemented  */
>>>> +		BUG_ON(kvm_pud_huge(*pud));
>>>> +		if (!pud_none(*pud))
>>>> +			stage2_wp_pmd_range(pud, addr, next);
>>>> +	} while (pud++, addr = next, addr != end);
>>>> +}
>>>> +
>>>> +/**
>>>> + * stage2_wp_range() - write protect stage2 memory region range
>>>> + * @kvm:	The KVM pointer
>>>> + * @start:	Start address of range
>>>> + * &end:	End address of range
>>>> + */
>>>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>>>> +{
>>>> +	pgd_t *pgd;
>>>> +	phys_addr_t next;
>>>> +
>>>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>>>> +	do {
>>>> +		/*
>>>> +		 * Release kvm_mmu_lock periodically if the memory region is
>>>> +		 * large. Otherwise, we may see kernel panics with
>>>> +		 * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCK_DETECTOR,
>>>> +		 * CONFIG_LOCK_DEP. Additionally, holding the lock too long
>>>> +		 * will also starve other vCPUs.
>>>> +		 */
>>>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>>>> +			cond_resched_lock(&kvm->mmu_lock);
>>>> +
>>>> +		next = kvm_pgd_addr_end(addr, end);
>>>> +		if (pgd_present(*pgd))
>>>> +			stage2_wp_pud_range(kvm, pgd, addr, next);
>>>> +	} while (pgd++, addr = next, addr != end);
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
>>>> + * @kvm:	The KVM pointer
>>>> + * @slot:	The memory slot to write protect
>>>> + *
>>>> + * Called to start logging dirty pages after memory region
>>>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>>>> + * all present PMD and PTEs are write protected in the memory region.
>>>> + * Afterwards read of dirty page log can be called.
>>>> + *
>>>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>>>> + * serializing operations for VM memory regions.
>>>> + */
>>>> +
>>>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>>>> +{
>>>> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>>>> +	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
>>>> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>>>> +
>>>> +	spin_lock(&kvm->mmu_lock);
>>>> +	stage2_wp_range(kvm, start, end);
>>>> +	kvm_flush_remote_tlbs(kvm);
>>>> +	spin_unlock(&kvm->mmu_lock);
>>>> +}
>>>> +#endif
>>>> +
>>>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  			  struct kvm_memory_slot *memslot,
>>>>  			  unsigned long fault_status)
>>>
>>>
>>>
>>
>>
>> .
>>
> 
> 
> 

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm