Re: [PATCH 1/1] ksm: introduce ksm_max_page_sharing per page deduplication limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrea,

On Thu, Sep 22, 2016 at 6:48 PM, Gavin Guo <gavin.guo@xxxxxxxxxxxxx> wrote:
>
> Hi Andrea,
>
> On Wed, Sep 21, 2016 at 11:34 PM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
> > Hello Gavin,
> >
> > On Wed, Sep 21, 2016 at 11:12:19PM +0800, Gavin Guo wrote:
> >> Recently, a similar bug can also be observed under the numad process
> >> with the v4.4 Ubuntu kernel or the latest upstream kernel. However, I
> >> think the patch should be useful to mitigate the symptom. I tried to
> >> search the mailing list and found the patch finally didn't be merged
> >> into the upstream kernel. If there are any problems which drop the
> >> patch?
> >
> > Zero known problems, in fact it's running in production in both RHEL7
> > and RHEL6 for a while. The RHEL customers are not affected anymore for
> > a while now.
> >
> > It's a critical computational complexity fix, if using KSM in
> > enterprise production. Hugh already Acked it as well.
> >
> > It's included in -mm and Andrew submitted it once upstream, but it
> > bounced probably perhaps it was not the right time in the merge window
> > cycle.
> >
> > Or perhaps because it's complex but I wouldn't know how to simplify it
> > but there's no bug at all in the code.
> >
> > I would suggest Andrew to send it once again when he feels it's a good
> > time to do so.
> >
> >> The numad process tried to migrate a qemu process of 33GB memory.
> >> Finally, it stuck in the csd_lock_wait function which causes the qemu
> >> process hung and the virtual machine has high CPU usage and hung also.
> >> With KSM disabled, the symptom disappeared.
> >
> > Until it's merged upstream you can cherrypick from my aa.git tree
> > these three commits:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=9384142e4ce830898abcefc4f0479c4533fa5bbc
> > https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=4b293be7e20c8e8731a4fdc3c3bf6047304d0cc8
> > https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=44c0d79c2c223c54ffe3fabc893963fc5963d611
> >
> > They're in -mm too.
> >
> >> What happens here is that do_migrate_pages (frame #10) acquires the
> >> mmap_sem semaphore that everything else is waiting for (and that
> >> eventually produce the hang warnings), and it holds that semaphore for
> >> the duration of the page migration.
> >>
> >> crash> bt 2950
> >> PID: 2950   TASK: ffff885f97745280  CPU: 49  COMMAND: "numad"
> >>     [exception RIP: smp_call_function_single+219]
> >>     RIP: ffffffff81103a0b  RSP: ffff885f8fb4fb28  RFLAGS: 00000202
> >>     RAX: 0000000000000000  RBX: 0000000000000013  RCX: 0000000000000000
> >>     RDX: 0000000000000003  RSI: 0000000000000100  RDI: 0000000000000286
> >>     RBP: ffff885f8fb4fb70   R8: 0000000000000000   R9: 0000000000080000
> >>     R10: 0000000000000000  R11: ffff883faf917c88  R12: ffffffff810725f0
> >>     R13: 0000000000000013  R14: ffffffff810725f0  R15: ffff885f8fb4fbc8
> >>     CS: 0010  SS: 0018
> >>  #0 [ffff885f8fb4fb30] kvm_unmap_rmapp at ffffffffc01f1c3e [kvm]
> >>  #1 [ffff885f8fb4fb78] smp_call_function_many at ffffffff81103db3
> >>  #2 [ffff885f8fb4fbc0] native_flush_tlb_others at ffffffff8107279d
> >>  #3 [ffff885f8fb4fc08] flush_tlb_page at ffffffff81072a95
> >>  #4 [ffff885f8fb4fc30] ptep_clear_flush at ffffffff811d048e
> >>  #5 [ffff885f8fb4fc60] try_to_unmap_one at ffffffff811cb1c7
> >>  #6 [ffff885f8fb4fcd0] rmap_walk_ksm at ffffffff811e6f91
> >>  #7 [ffff885f8fb4fd28] rmap_walk at ffffffff811cc1bf
> >>  #8 [ffff885f8fb4fd80] try_to_unmap at ffffffff811cc46b
> >>  #9 [ffff885f8fb4fdc8] migrate_pages at ffffffff811f26d8
> >> #10 [ffff885f8fb4fe80] do_migrate_pages at ffffffff811e15f7
> >> #11 [ffff885f8fb4fef8] sys_migrate_pages at ffffffff811e187d
> >> #12 [ffff885f8fb4ff50] entry_SYSCALL_64_fastpath at ffffffff818244f2
> >>
> >> After some investigations, I've tried to disassemble the coredump and
> >> finally find the stable_node->hlist is as long as 2306920 entries.
> >
> > Yep, this is definitely getting fixed by the three commits above and
> > the problem is in rmap_walk_ksm like you found above. With that
> > applied you can't ever run into hangs anymore with KSM enabled, no
> > matter the workload and the amount of memory in guest and host.
> >
> > numad isn't required to reproduce it, some swapping is enough.
> >
> > It limits the de-duplication factor to 256 times, like a x256 times
> > compression, a x256 compression factor is clearly more than enough. So
> > effectively the list you found that was too long, gets hard-limited to
> > 256 entries with my patch applied. The limit is configurable at runtime:
> >
> > /* Maximum number of page slots sharing a stable node */
> > static int ksm_max_page_sharing = 256;
> >
> > If you want to increase the limit (careful: that will increase
> > the rmap_walk_ksm computation time) you can echo $newsharinglimit >
> > /sys/kernel/mm/ksm/max_page_sharing.
> >
> > Hope this helps,
> > Andrea
>
> Thank you for the detail explanation. I've cherry-picked these patches
> and now doing the verification. I'll get back to you if there is any
> problem. Thanks!

Andrea,

I have tried verifying these patches. However, the default 256
bytes max_page_sharing still suffers the hung task issue. Then, the
following sequence has been tried to mitigate the symptom. When the
value is decreased, it took more time to reproduce the symptom.
Finally, the value 8 has been tried and I didn't continue with lower
value.

128 -> 64 -> 32 -> 16 -> 8

The crashdump has also been investigated.

stable_node: 0xffff880d36413040 stable_node->hlist->first = 0xffff880e4c9f4cf0
crash> list hlist_node.next 0xffff880e4c9f4cf0  > rmap_item.lst

$ wc -l rmap_item.lst
$ 8 rmap_item.lst

This shows that the list is actually reduced to 8 items. I wondered if the
loop is still consuming a lot of time and hold the mmap_sem too long.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]