On Sat, Dec 22, 2012 at 1:37 AM, Michel Lespinasse <walken@xxxxxxxxxx> wrote: > On Fri, Dec 21, 2012 at 6:16 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> On Fri, Dec 21, 2012 at 5:59 PM, Michel Lespinasse <walken@xxxxxxxxxx> wrote: >>> Could you share your test case so I can try reproducing the issue >>> you're seeing ? >> >> Not so easy. My test case is a large chunk of a high-frequency >> trading system :) > > Huh, its probably better if I don't see it then :) > >> I just tried it again. Not I have a task stuck in >> mlockall(MCL_CURRENT|MCL_FUTURE). The stack is: >> >> [<0000000000000000>] flush_work+0x1c2/0x280 >> [<0000000000000000>] schedule_on_each_cpu+0xe3/0x130 >> [<0000000000000000>] lru_add_drain_all+0x15/0x20 >> [<0000000000000000>] sys_mlockall+0x125/0x1a0 >> [<0000000000000000>] tracesys+0xd0/0xd5 >> [<0000000000000000>] 0xffffffffffffffff >> >> The sequence of mmap and munmap calls, according to strace, is: >> > [...] >> 6084 mmap(0x7f54fd02a000, 6776, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f54fd02a000 > > So I noticed you use mmap with a size that is not a multiple of > PAGE_SIZE. This is perfectly legal, but I hadn't tested that case, and > lo and behold, it's something I got wrong. Patch to be sent as a reply > to this. Without this patch, vm_populate() will show a debug message > if you have CONFIG_DEBUG_VM set, and likely spin in an infinite loop > if you don't. > >> 6084 mmap(NULL, 26258, PROT_READ, MAP_SHARED, 4, 0) = 0x7f5509f9d000 >> 6084 mmap(NULL, 4096, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5509f9c000 >> 6084 munmap(0x7f5509f9c000, 4096) = 0 >> 6084 mlockall(MCL_CURRENT|MCL_FUTURE >> >> This task is unkillable. Two other tasks are stuck spinning. > > Now I'm confused, because: > > 1- your trace shows the hang occurs during mlockall(), and this code > really wasn't touched much in my series (besides renaming > do_mlock_pages into __mm_populate()) > > 2- the backtrace above showed sys_mlockall() -> lru_add_drain_all(), > which is the very beginning of mlockall(), before anything of > importance happens (and in particular, before the MCL_FUTURE flag > takes action). So, I'm going to assume that it's one of the other > spinning threads that is breaking things. If one of the spinning > threads got stuck within vm_populate(), this could even be explained > by the bug I mentioned above. > > Could you check if the fix I'm going to send as a reply to this works > for you, and if not, where the two spinning threads are being stuck ? > It works. In case anyone cares, the whole series is Tested-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx> I'll let you know if anything else breaks. I'll be pounding on a kernel with this patched in for the next couple of days, I expect. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>