On Mon, Sep 08, 2014 at 01:56:55PM -0400, Sasha Levin wrote: > On 09/08/2014 01:18 PM, Mel Gorman wrote: > > A worse possibility is that somehow the lock is getting corrupted but > > that's also a tough sell considering that the locks should be allocated > > from a dedicated cache. I guess I could try breaking that to allocate > > one page per lock so DEBUG_PAGEALLOC triggers but I'm not very > > optimistic. > > I did see ptl corruption couple days ago: > > https://lkml.org/lkml/2014/9/4/599 > > Could this be related? > Possibly although the likely explanation then would be that there is just general corruption coming from somewhere. Even using your config and applying a patch to make linux-next boot (already in Tejun's tree) I was unable to reproduce the problem after running for several hours. I had to run trinity on tmpfs as ext4 and xfs blew up almost immediately so I have a few questions. 1. What filesystem are you using? 2. What compiler in case it's an experimental compiler? I ask because I think I saw a patch from you adding support so that the kernel would build with gcc 5 3. Does your hardware support TSX or anything similarly funky that would potentially affect locking? 4. How many sockets are on your test machine in case reproducing it depends in a machine large enough to open a timing race? As I'm drawing a blank on what would trigger the bug I'm hoping I can reproduce this locally and experiement a bit. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>