On Fri, Dec 21, 2012 at 6:16 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Fri, Dec 21, 2012 at 5:59 PM, Michel Lespinasse <walken@xxxxxxxxxx> wrote: >> Could you share your test case so I can try reproducing the issue >> you're seeing ? > > Not so easy. My test case is a large chunk of a high-frequency > trading system :) Huh, its probably better if I don't see it then :) > I just tried it again. Not I have a task stuck in > mlockall(MCL_CURRENT|MCL_FUTURE). The stack is: > > [<0000000000000000>] flush_work+0x1c2/0x280 > [<0000000000000000>] schedule_on_each_cpu+0xe3/0x130 > [<0000000000000000>] lru_add_drain_all+0x15/0x20 > [<0000000000000000>] sys_mlockall+0x125/0x1a0 > [<0000000000000000>] tracesys+0xd0/0xd5 > [<0000000000000000>] 0xffffffffffffffff > > The sequence of mmap and munmap calls, according to strace, is: > [...] > 6084 mmap(0x7f54fd02a000, 6776, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f54fd02a000 So I noticed you use mmap with a size that is not a multiple of PAGE_SIZE. This is perfectly legal, but I hadn't tested that case, and lo and behold, it's something I got wrong. Patch to be sent as a reply to this. Without this patch, vm_populate() will show a debug message if you have CONFIG_DEBUG_VM set, and likely spin in an infinite loop if you don't. > 6084 mmap(NULL, 26258, PROT_READ, MAP_SHARED, 4, 0) = 0x7f5509f9d000 > 6084 mmap(NULL, 4096, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5509f9c000 > 6084 munmap(0x7f5509f9c000, 4096) = 0 > 6084 mlockall(MCL_CURRENT|MCL_FUTURE > > This task is unkillable. Two other tasks are stuck spinning. Now I'm confused, because: 1- your trace shows the hang occurs during mlockall(), and this code really wasn't touched much in my series (besides renaming do_mlock_pages into __mm_populate()) 2- the backtrace above showed sys_mlockall() -> lru_add_drain_all(), which is the very beginning of mlockall(), before anything of importance happens (and in particular, before the MCL_FUTURE flag takes action). So, I'm going to assume that it's one of the other spinning threads that is breaking things. If one of the spinning threads got stuck within vm_populate(), this could even be explained by the bug I mentioned above. Could you check if the fix I'm going to send as a reply to this works for you, and if not, where the two spinning threads are being stuck ? -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>