On 2014/9/4 7:01, Mario Smarduch wrote: > On 09/02/2014 07:42 PM, wanghaibin wrote: >> On 2014/8/27 8:04, Mario Smarduch wrote: >> >>> Patch adds support for initial write protection of VM memlsot. This patch series >>> assumes that huge PUDs will not be used in 2nd stage tables, which is awlays >>> valid on ARMv7. >>> >>> Signed-off-by: Mario Smarduch <m.smarduch@xxxxxxxxxxx> >>> --- >>> arch/arm/include/asm/kvm_host.h | 1 + >>> arch/arm/include/asm/kvm_mmu.h | 20 ++++++ >>> arch/arm/include/asm/pgtable-3level.h | 1 + >>> arch/arm/kvm/arm.c | 9 +++ >>> arch/arm/kvm/mmu.c | 128 ++++++++++++++++++++++++++++++++++ >>> 5 files changed, 159 insertions(+) >>> > [...] >>> + >>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd) >>> +{ >>> + pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY; >>> +} >>> + >>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd) >>> +{ >>> + return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY; >>> +} >>> + >>> /* Open coded p*d_addr_end that can deal with 64bit addresses */ >>> #define kvm_pgd_addr_end(addr, end) \ >>> ({ u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK; \ >>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h >>> index 85c60ad..d8bb40b 100644 >>> --- a/arch/arm/include/asm/pgtable-3level.h >>> +++ b/arch/arm/include/asm/pgtable-3level.h >>> @@ -129,6 +129,7 @@ >>> #define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */ >>> #define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */ >>> >> >> >> Hi, Mario, the L_PMD_S2_RDONLY defined, **pteval_t** type should be *pmdval_t* . > > Yes it should thanks for catching it. >> >>> +#define L_PMD_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */ >> >>> #define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ >>> >>> /* >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >>> index c52b2bd..e1be6c7 100644 >>> --- a/arch/arm/kvm/arm.c >>> +++ b/arch/arm/kvm/arm.c >>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, >>> const struct kvm_memory_slot *old, >>> enum kvm_mr_change change) >>> { >>> +#ifdef CONFIG_ARM >> >> >> About CONFIG_ARM , I suggest cancle this limmit.. >> I just implement a few interfaces for ARM64 , and tested this patch using VNC, it's work. >> > Yes I was planing to do that once ARMv7 and generic/arch split for > dirty log and tlb flush functions are settled. I realize at this time > ARMv8 just needs it's own tlb flush and it's done. But I need a strong > test case before claiming ARMv8 support > > I have some ideas on testing it but migration is probably most > reliable and the only use case really. > > Can you elaborate on your test environment for ARMv8 (you could > email off-line if too long, also thanks for testing!). Oh, I just create some device (keyboard, mouse, clcd, panel) and add some corresponding dtb node in mach-virt(qemu). ps: now, kernel haven't upstream the newest CLCD patch-set, so, we need pull these patches to guest. If you need, I will send this patch to you! > > > >>> + /* >>> + * At this point memslot has been committed and there is an >>> + * allocated dirty_bitmap[], dirty pages will be be tracked while the >>> + * memory slot is write protected. >>> + */ >>> + if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES)) >>> + kvm_mmu_wp_memory_region(kvm, mem->slot); >>> +#endif >>> } >>> > [...] >>> +/** >>> + * stage2_wp_pud_range - write protect PUD range >>> + * @kvm: pointer to kvm structure >>> + * @pud: pointer to pgd entry >>> + * @addr: range start address >>> + * @end: range end address >>> + * >>> + * While walking the PUD range huge PUD pages are ignored. >>> + */ >>> +static void stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd, >>> + phys_addr_t addr, phys_addr_t end) >>> +{ >>> + pud_t *pud; >>> + phys_addr_t next; >>> + >>> + pud = pud_offset(pgd, addr); >>> + do { >>> + next = kvm_pud_addr_end(addr, end); >> >> >> There may be a problem. >> I tested the patch set using VNC, the video ram which qemu emulated for CLCD, will be regist to KVM with >> the log dirty attribution. > > 'log dirty attribution' what do you mean by that? > >> It is worth to mention that the S2 page for video ram is not created at that time. >> So, there must be BUG_ON(kvm_pud_huge(*pud)); > > 'not created at that time'? What time? > > The time when memory region is registered should have no > bearing on logging after it's committed logging > will start, if its log flag is set. > >> >> I think this code should be : >> >> if (!pud_none(*pud)) { /* check the *pud valid first */ >> BUG_ON(kvm_pud_huge(*pud)); >> stage2_wp_pmd_range(pud, addr, next); >> } > > I may be missing something here but not sure how the update improves > anything? Definitely follow-up if I'm off on this. > > Thanks. OK, I mean that: In qemu CLCD video ram emulated, here is the code: memory_region_init_ram(vram, NULL, "virt.vram", size); vmstate_register_ram_global(vram); memory_region_add_subregion(sysmem, base, vram); memory_region_set_log(vram, true, DIRTY_MEMORY_VGA); /* mem->flag dirty log*/ Then, qemu will ioctl(KVM_SET_USER_MEMORY_REGION), with the mem->flag attribution(dirty log). and after, there must call kvm_mmu_wp_memory_region to change this memslot S2 page write protect. I mean this time, the S2 is not created, That is the *pud == 0x0 . So, maybe this problem is that: the first: we should check the *pud is valid/none or not. the second: check the *pud is huge or not. Thanks . >> >> >> Thanks ! >> >>> + /* TODO: PUD not supported, revisit later if implemented */ >>> + BUG_ON(kvm_pud_huge(*pud)); >>> + if (!pud_none(*pud)) >>> + stage2_wp_pmd_range(pud, addr, next); >>> + } while (pud++, addr = next, addr != end); >>> +} >>> + >>> +/** >>> + * stage2_wp_range() - write protect stage2 memory region range >>> + * @kvm: The KVM pointer >>> + * @start: Start address of range >>> + * &end: End address of range >>> + */ >>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) >>> +{ >>> + pgd_t *pgd; >>> + phys_addr_t next; >>> + >>> + pgd = kvm->arch.pgd + pgd_index(addr); >>> + do { >>> + /* >>> + * Release kvm_mmu_lock periodically if the memory region is >>> + * large. Otherwise, we may see kernel panics with >>> + * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCK_DETECTOR, >>> + * CONFIG_LOCK_DEP. Additionally, holding the lock too long >>> + * will also starve other vCPUs. >>> + */ >>> + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) >>> + cond_resched_lock(&kvm->mmu_lock); >>> + >>> + next = kvm_pgd_addr_end(addr, end); >>> + if (pgd_present(*pgd)) >>> + stage2_wp_pud_range(kvm, pgd, addr, next); >>> + } while (pgd++, addr = next, addr != end); >>> +} >>> + >>> +/** >>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot >>> + * @kvm: The KVM pointer >>> + * @slot: The memory slot to write protect >>> + * >>> + * Called to start logging dirty pages after memory region >>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns >>> + * all present PMD and PTEs are write protected in the memory region. >>> + * Afterwards read of dirty page log can be called. >>> + * >>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired, >>> + * serializing operations for VM memory regions. >>> + */ >>> + >>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot) >>> +{ >>> + struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot); >>> + phys_addr_t start = memslot->base_gfn << PAGE_SHIFT; >>> + phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT; >>> + >>> + spin_lock(&kvm->mmu_lock); >>> + stage2_wp_range(kvm, start, end); >>> + kvm_flush_remote_tlbs(kvm); >>> + spin_unlock(&kvm->mmu_lock); >>> +} >>> +#endif >>> + >>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>> struct kvm_memory_slot *memslot, >>> unsigned long fault_status) >> >> >> > > > . > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm