On Tue, Sep 27, 2016 at 12:33:08PM -0600, Chris Friesen wrote: > > Sorry, I had a typo in my earlier message. The issue is actually in slub.c. > > Chris > > On 09/27/2016 10:12 AM, Chris Friesen wrote: > > > >I've got a CentOS 7 kernel that has been slightly modified, but the mm > >subsystem hasn't been touched. I'm hoping you can give me some guidance. > > > >I have an intermittent Oops that looks like what is below. The issue > >is currently occurring on one CPU of one system, but has been seen > >before infrequently. Once the corruption occurs it causes an Oops on > >every call to __mpol_dup() on this CPU. > > > >Basically it appears that __mpol_dup() is failing because the value of > >c->freelist in slab_alloc_node() is corrupt, causing the call to > >get_freepointer_safe(s, object) to Oops because it tries to dereference > >"object + s->offset". (Where s->offset is zero.) > > > >In the trace, "kmem_cache_alloc+0x87" maps to the following assembly: > > 0xffffffff8118be17 <+135>: mov (%r12,%rax,1),%rbx > > > >This corresponds to this line in get_freepointer(): > > return *(void **)(object + s->offset); > > > >In the assembly code, R12 is "object", and RAX is s->offset. > > > >So the question becomes, why is "object" (which corresponds to c->freelist) > >corrupt? > > > >Looking at the value of R12 (0x1ada8000), it's nonzero but also not a > >valid pointer. Does the value mean anything to you? (I'm not really > >a memory subsystem guy, so I'm hoping you might have some ideas.) > > > >Do you have any suggestions on how to track down what's going on here? Please run with kernel parameter "slub_debug=F" or something. See Documentation/vm/slub.txt. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>