On 09/03/2014 08:29 PM, wanghaibin wrote: > On 2014/9/4 7:01, Mario Smarduch wrote: > >> On 09/02/2014 07:42 PM, wanghaibin wrote: >>> On 2014/8/27 8:04, Mario Smarduch wrote: >>> >>>> Patch adds support for initial write protection of VM memlsot. This patch series >>>> assumes that huge PUDs will not be used in 2nd stage tables, which is awlays >>>> valid on ARMv7. >>>> >>>> Signed-off-by: Mario Smarduch <m.smarduch@xxxxxxxxxxx> >>>> --- >>>> arch/arm/include/asm/kvm_host.h | 1 + >>>> arch/arm/include/asm/kvm_mmu.h | 20 ++++++ >>>> arch/arm/include/asm/pgtable-3level.h | 1 + >>>> arch/arm/kvm/arm.c | 9 +++ >>>> arch/arm/kvm/mmu.c | 128 ++++++++++++++++++++++++++++++++++ >>>> 5 files changed, 159 insertions(+) >>>> >> [...] >>>> + >>>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd) >>>> +{ >>>> + pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY; >>>> +} >>>> + >>>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd) >>>> +{ >>>> + return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY; >>>> +} >>>> + >>>> /* Open coded p*d_addr_end that can deal with 64bit addresses */ >>>> #define kvm_pgd_addr_end(addr, end) \ >>>> ({ u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK; \ >>>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h >>>> index 85c60ad..d8bb40b 100644 >>>> --- a/arch/arm/include/asm/pgtable-3level.h >>>> +++ b/arch/arm/include/asm/pgtable-3level.h >>>> @@ -129,6 +129,7 @@ >>>> #define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */ >>>> #define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */ >>>> >>> >>> >>> Hi, Mario, the L_PMD_S2_RDONLY defined, **pteval_t** type should be *pmdval_t* . >> >> Yes it should thanks for catching it. >>> >>>> +#define L_PMD_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */ >>> >>>> #define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ >>>> >>>> /* >>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >>>> index c52b2bd..e1be6c7 100644 >>>> --- a/arch/arm/kvm/arm.c >>>> +++ b/arch/arm/kvm/arm.c >>>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, >>>> const struct kvm_memory_slot *old, >>>> enum kvm_mr_change change) >>>> { >>>> +#ifdef CONFIG_ARM >>> >>> >>> About CONFIG_ARM , I suggest cancle this limmit.. >>> I just implement a few interfaces for ARM64 , and tested this patch using VNC, it's work. >>> >> Yes I was planing to do that once ARMv7 and generic/arch split for >> dirty log and tlb flush functions are settled. I realize at this time >> ARMv8 just needs it's own tlb flush and it's done. But I need a strong >> test case before claiming ARMv8 support >> >> I have some ideas on testing it but migration is probably most >> reliable and the only use case really. >> >> Can you elaborate on your test environment for ARMv8 (you could >> email off-line if too long, also thanks for testing!). > > > Oh, I just create some device (keyboard, mouse, clcd, panel) and add some corresponding dtb node > in mach-virt(qemu). > ps: now, kernel haven't upstream the newest CLCD patch-set, so, we need pull these patches to guest. > > If you need, I will send this patch to you! Yes please do. > >> >> >> >>>> + /* >>>> + * At this point memslot has been committed and there is an >>>> + * allocated dirty_bitmap[], dirty pages will be be tracked while the >>>> + * memory slot is write protected. >>>> + */ >>>> + if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES)) >>>> + kvm_mmu_wp_memory_region(kvm, mem->slot); >>>> +#endif >>>> } >>>> >> [...] >>>> +/** >>>> + * stage2_wp_pud_range - write protect PUD range >>>> + * @kvm: pointer to kvm structure >>>> + * @pud: pointer to pgd entry >>>> + * @addr: range start address >>>> + * @end: range end address >>>> + * >>>> + * While walking the PUD range huge PUD pages are ignored. >>>> + */ >>>> +static void stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd, >>>> + phys_addr_t addr, phys_addr_t end) >>>> +{ >>>> + pud_t *pud; >>>> + phys_addr_t next; >>>> + >>>> + pud = pud_offset(pgd, addr); >>>> + do { >>>> + next = kvm_pud_addr_end(addr, end); >>> >>> >>> There may be a problem. >>> I tested the patch set using VNC, the video ram which qemu emulated for CLCD, will be regist to KVM with >>> the log dirty attribution. >> >> 'log dirty attribution' what do you mean by that? >> >>> It is worth to mention that the S2 page for video ram is not created at that time. >>> So, there must be BUG_ON(kvm_pud_huge(*pud)); >> >> 'not created at that time'? What time? >> >> The time when memory region is registered should have no >> bearing on logging after it's committed logging >> will start, if its log flag is set. >> >>> >>> I think this code should be : >>> >>> if (!pud_none(*pud)) { /* check the *pud valid first */ >>> BUG_ON(kvm_pud_huge(*pud)); >>> stage2_wp_pmd_range(pud, addr, next); >>> } >> >> I may be missing something here but not sure how the update improves >> anything? Definitely follow-up if I'm off on this. >> >> Thanks. > > > OK, I mean that: > In qemu CLCD video ram emulated, here is the code: > > memory_region_init_ram(vram, NULL, "virt.vram", size); > vmstate_register_ram_global(vram); > memory_region_add_subregion(sysmem, base, vram); > memory_region_set_log(vram, true, DIRTY_MEMORY_VGA); /* mem->flag dirty log*/ > > Then, qemu will ioctl(KVM_SET_USER_MEMORY_REGION), with the mem->flag attribution(dirty log). > and after, there must call kvm_mmu_wp_memory_region to change this memslot S2 page > write protect. > > I mean this time, the S2 is not created, That is the *pud == 0x0 . > > So, maybe this problem is that: > the first: we should check the *pud is valid/none or not. > the second: check the *pud is huge or not. I see check for huge pud only if pud is not NULL, sounds good. You also bring up a use case for dirty page logging (other then migration), it appears QEMU VNC server uses it for screen updates on client. > > Thanks . > >>> >>> >>> Thanks ! >>> >>>> + /* TODO: PUD not supported, revisit later if implemented */ >>>> + BUG_ON(kvm_pud_huge(*pud)); >>>> + if (!pud_none(*pud)) >>>> + stage2_wp_pmd_range(pud, addr, next); >>>> + } while (pud++, addr = next, addr != end); >>>> +} >>>> + >>>> +/** >>>> + * stage2_wp_range() - write protect stage2 memory region range >>>> + * @kvm: The KVM pointer >>>> + * @start: Start address of range >>>> + * &end: End address of range >>>> + */ >>>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) >>>> +{ >>>> + pgd_t *pgd; >>>> + phys_addr_t next; >>>> + >>>> + pgd = kvm->arch.pgd + pgd_index(addr); >>>> + do { >>>> + /* >>>> + * Release kvm_mmu_lock periodically if the memory region is >>>> + * large. Otherwise, we may see kernel panics with >>>> + * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCK_DETECTOR, >>>> + * CONFIG_LOCK_DEP. Additionally, holding the lock too long >>>> + * will also starve other vCPUs. >>>> + */ >>>> + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) >>>> + cond_resched_lock(&kvm->mmu_lock); >>>> + >>>> + next = kvm_pgd_addr_end(addr, end); >>>> + if (pgd_present(*pgd)) >>>> + stage2_wp_pud_range(kvm, pgd, addr, next); >>>> + } while (pgd++, addr = next, addr != end); >>>> +} >>>> + >>>> +/** >>>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot >>>> + * @kvm: The KVM pointer >>>> + * @slot: The memory slot to write protect >>>> + * >>>> + * Called to start logging dirty pages after memory region >>>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns >>>> + * all present PMD and PTEs are write protected in the memory region. >>>> + * Afterwards read of dirty page log can be called. >>>> + * >>>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired, >>>> + * serializing operations for VM memory regions. >>>> + */ >>>> + >>>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot) >>>> +{ >>>> + struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot); >>>> + phys_addr_t start = memslot->base_gfn << PAGE_SHIFT; >>>> + phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT; >>>> + >>>> + spin_lock(&kvm->mmu_lock); >>>> + stage2_wp_range(kvm, start, end); >>>> + kvm_flush_remote_tlbs(kvm); >>>> + spin_unlock(&kvm->mmu_lock); >>>> +} >>>> +#endif >>>> + >>>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>>> struct kvm_memory_slot *memslot, >>>> unsigned long fault_status) >>> >>> >>> >> >> >> . >> > > > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm