On 5/15/2018 8:38 PM, Jia He Wrote: > Hi Suzuki > > On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote: >> >> Hi Jia >> >> On 05/15/2018 03:03 AM, Jia He wrote: >>> Hi Suzuki >>> >>> I will merge the other thread into this, and add the necessary CC list >>> >>> That WARN_ON call trace is very easy to reproduce in my armv8a server after I >>> start 20 guests >>> >>> and run memhog in the host. Of course, ksm should be enabled >>> >>> For you question about my inject fault debug patch: >> >> >> Thanks for the patch, comments below. >> >>> >> >> ... >> >>> index 7f6a944..ab8545e 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, >>> * destroying the VM), otherwise another faulting VCPU may come in and mess >>> * with things behind our backs. >>> */ >>> +extern int trigger_by_ksm; >>> static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) >>> { >>> pgd_t *pgd; >>> phys_addr_t addr = start, end = start + size; >>> phys_addr_t next; >>> >>> + if(trigger_by_ksm) { >>> + end -= 0x200; >>> + } >>> + >>> assert_spin_locked(&kvm->mmu_lock); >>> pgd = kvm->arch.pgd + stage2_pgd_index(addr); >>> do { >>> >>> I need to point out that I never reproduced it without this debugging patch. >> >> That could trigger the panic iff your "size" <= 0x200, leading to the >> condition (end < start), which can make the loop go forever, as we >> do while(addr < end) and end up accessing something which may not be PGD entry >> and thus get a bad page with bad numbers all around. This case could be hit only >> with your change and the bug in the KSM which gives us an address near the page >> boundary. > No, I injected the fault on purpose to simulate the case when size is less than > PAGE_SIZE(eg. PAGE_SIZE-0x200=65024) > I ever got the panic info [1] *without* the debugging patch only once > > [1] https://lkml.org/lkml/2018/5/9/992 >> >> So, I think we can safely ignore the PANIC(). >> More below. >> >> >>>>> Suzuki, thanks for the comments. >>>>> >>>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >>>>> The root cause is ksm will add some extra flags to indicate that the page >>>>> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE. >>>> Thanks for the pointer. In the future, please Cc the people relevant to the >>>> discussion in the patches. >>>> >>>>> From arm kvm mmu point of view, do you think handle_hva_to_gpa still need >>>>> to handle >>>>> the unalignment case? >>>> I don't think we should do that. Had we done this, we would never have caught >>>> this bug >>>> in KSM. Eventually if some other new implementation comes up with the a new >>>> notifier >>>> consumer which doesn't check alignment and doesn't WARN, it could simply do >>>> the wrong >>>> thing. So I believe what we have is a good measure to make sure that things are >>>> in the right order. >>>> >>>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the >>>>> bottom function >>>>> kvm_age_hva_handler to handle the exception. Please refer to the >>>>> implementation in X86 and >>>>> powerpc kvm_handle_hva_range(). They both aligned the hva with >>>>> hva_to_gfn_memslot. >>>>> >>>> From an API perspective, you are passed on a "start" and "end" address. So, >>>> you could potentially >>>> do the wrong thing if you align the "start" and "end". May be those handlers >>>> should also do the >>>> same thing as we do. >> >>> But handle_hva_to_gpa has partially adjusted the alignment possibly: >>> 1750 kvm_for_each_memslot(memslot, slots) { >>> 1751 unsigned long hva_start, hva_end; >>> 1752 gfn_t gpa; >>> 1753 >>> 1754 hva_start = max(start, memslot->userspace_addr); >>> 1755 hva_end = min(end, memslot->userspace_addr + >>> 1756 (memslot->npages << PAGE_SHIFT)); >>> >>> at line 1755, let us assume that end=0x12340200 and >>> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 >>> Then, hva_start is not page_size aligned and hva_end is aligned, and the size >>> will be PAGE_SIZE-0x200, >>> just as what I had done in the inject fault debugging patch. >> >> Thats because we want to limit the handling of the hva/gpa range by memslot. So, >> we make sure we pass on the range within the given memslot >> to hva_to_gfn_memslot(). But we do iterate over the next memslot if the >> original range falls in to the next slot. So, in practice, there is no >> alignment/trimming of the range. Its just that we pass on the appropriate range >> for each slot. >> > Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is > hva_end may be changed and (hva_end - hva_start) will not be same as the > parameter _size_ ? > >> ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); > > Anyway, I have to admit that all the exceptions are originally caused by the > STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm > handle the exception more gracefully. > Hi Suzuki How about this patch (balance of avoiding the WARN_ON storm and debugging convenience): diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7f6a944..4033946 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -297,6 +297,8 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) phys_addr_t next; assert_spin_locked(&kvm->mmu_lock); + + WARN_ON_ONCE(size & ~PAGE_MASK); pgd = kvm->arch.pgd + stage2_pgd_index(addr); do { /* @@ -1792,7 +1794,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data { pte_t *pte = (pte_t *)data; - WARN_ON(size != PAGE_SIZE); + WARN_ON_ONCE(size != PAGE_SIZE); /* * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE * flag clear because MMU notifiers will have unmapped a huge PMD before @@ -1823,7 +1825,7 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); + WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE); pmd = stage2_get_pmd(kvm, NULL, gpa); if (!pmd || pmd_none(*pmd)) /* Nothing there */ return 0; @@ -1843,7 +1845,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void * pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); + WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE); pmd = stage2_get_pmd(kvm, NULL, gpa); if (!pmd || pmd_none(*pmd)) /* Nothing there */ return 0; -- Cheers, Jia _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm