On Tuesday, 22 of April 2008, Linus Torvalds wrote: > > On Tue, 22 Apr 2008, Rafael J. Wysocki wrote: > > > > > > The same place, dentry.d_hash.next is 1. No slub debug clues... I think, I'll > > > give slab a try. Any other clues? > > > > Well, SLUB uses some per CPU data structures. Is it possible that they get > > corrupted and which leads to the observed symptoms? > > It really doesn't look like the slub allocations themselves would be > corrupted. It very much looks like wild pointers corrupting allocations > that themselves were fine. > > The nybble pattern looked intriguing (especially as it apparently also hit > a normal page cache page!) but obviously not everything matches that > pattern (eg your value of 1). > > What do you do to trigger this? Any particular load? Is it still just > doing suspend/resume, or do you have something else that you are playing > with? I've seen that only once, so far. Jiri seems to be able to trigger it more often. > Also, have you tried CONFIG_DEBUG_PAGEALLOC? That can also be a very > powerful way to find memory corruption. I always have CONFIG_DEBUG_PAGEALLOC set. > Does anybody see any other patterns? Looking at the modules linked in in > the oopses from Zdenek, Rafael and Jiri, I don't see anything odd. You > both all have 80211 support, maybe the corruption comes from the wireless > layer? Well, I thought about that too. However, I had a hang before 2.6.25-git2 that I suspect was related (I couldn't get any information from the box, as it just hung solid), so I'd rather suspect some x86 changes. > Or maybe it's the x86 code changes themselves, and it really is about the > suspend/resume sequence itself. It seems to be specific to x86-64, AFAICS. > Are all the people who see this doing suspends? I'm not sure. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html