Hi Suzuki On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote: > > Hi Jia > > On 05/15/2018 03:03 AM, Jia He wrote: >> Hi Suzuki >> >> I will merge the other thread into this, and add the necessary CC list >> >> That WARN_ON call trace is very easy to reproduce in my armv8a server after I >> start 20 guests >> >> and run memhog in the host. Of course, ksm should be enabled >> >> For you question about my inject fault debug patch: > > > Thanks for the patch, comments below. > >> > > ... > >> index 7f6a944..ab8545e 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, >> * destroying the VM), otherwise another faulting VCPU may come in and mess >> * with things behind our backs. >> */ >> +extern int trigger_by_ksm; >> static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) >> { >> pgd_t *pgd; >> phys_addr_t addr = start, end = start + size; >> phys_addr_t next; >> >> + if(trigger_by_ksm) { >> + end -= 0x200; >> + } >> + >> assert_spin_locked(&kvm->mmu_lock); >> pgd = kvm->arch.pgd + stage2_pgd_index(addr); >> do { >> >> I need to point out that I never reproduced it without this debugging patch. > > That could trigger the panic iff your "size" <= 0x200, leading to the > condition (end < start), which can make the loop go forever, as we > do while(addr < end) and end up accessing something which may not be PGD entry > and thus get a bad page with bad numbers all around. This case could be hit only > with your change and the bug in the KSM which gives us an address near the page > boundary. No, I injected the fault on purpose to simulate the case when size is less than PAGE_SIZE(eg. PAGE_SIZE-0x200=65024) I ever got the panic info [1] *without* the debugging patch only once [1] https://lkml.org/lkml/2018/5/9/992 > > So, I think we can safely ignore the PANIC(). > More below. > > >>>> Suzuki, thanks for the comments. >>>> >>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >>>> The root cause is ksm will add some extra flags to indicate that the page >>>> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE. >>> Thanks for the pointer. In the future, please Cc the people relevant to the >>> discussion in the patches. >>> >>>> From arm kvm mmu point of view, do you think handle_hva_to_gpa still need >>>> to handle >>>> the unalignment case? >>> I don't think we should do that. Had we done this, we would never have caught >>> this bug >>> in KSM. Eventually if some other new implementation comes up with the a new >>> notifier >>> consumer which doesn't check alignment and doesn't WARN, it could simply do >>> the wrong >>> thing. So I believe what we have is a good measure to make sure that things are >>> in the right order. >>> >>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the >>>> bottom function >>>> kvm_age_hva_handler to handle the exception. Please refer to the >>>> implementation in X86 and >>>> powerpc kvm_handle_hva_range(). They both aligned the hva with >>>> hva_to_gfn_memslot. >>>> >>> From an API perspective, you are passed on a "start" and "end" address. So, >>> you could potentially >>> do the wrong thing if you align the "start" and "end". May be those handlers >>> should also do the >>> same thing as we do. > >> But handle_hva_to_gpa has partially adjusted the alignment possibly: >> 1750 kvm_for_each_memslot(memslot, slots) { >> 1751 unsigned long hva_start, hva_end; >> 1752 gfn_t gpa; >> 1753 >> 1754 hva_start = max(start, memslot->userspace_addr); >> 1755 hva_end = min(end, memslot->userspace_addr + >> 1756 (memslot->npages << PAGE_SHIFT)); >> >> at line 1755, let us assume that end=0x12340200 and >> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 >> Then, hva_start is not page_size aligned and hva_end is aligned, and the size >> will be PAGE_SIZE-0x200, >> just as what I had done in the inject fault debugging patch. > > Thats because we want to limit the handling of the hva/gpa range by memslot. So, > we make sure we pass on the range within the given memslot > to hva_to_gfn_memslot(). But we do iterate over the next memslot if the > original range falls in to the next slot. So, in practice, there is no > alignment/trimming of the range. Its just that we pass on the appropriate range > for each slot. > Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is hva_end may be changed and (hva_end - hva_start) will not be same as the parameter _size_ ? >ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); Anyway, I have to admit that all the exceptions are originally caused by the STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm handle the exception more gracefully. -- Cheers, Jia _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm