On Fri, 5 Aug 2011, Mel Gorman wrote: > > > This is interesting, I just change as following: > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index eb5a8f9..616b78e 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -2104,8 +2104,9 @@ static void *__slab_alloc(struct kmem_cache *s, > > > gfp_t gfpflags, int node, > > > "__slab_alloc")); > > > > > > if (unlikely(!object)) { > > > - c->page = NULL; > > > + //c->page = NULL; > > > stat(s, DEACTIVATE_BYPASS); > > > + deactivate_slab(s, c); > > > goto new_slab; > > > } > > > > > > Then my system doesn't print any list corruption warnings and my build > > > success then. So this means revert of 03e404af2 could cure this. > > > I'll do more test next week to see if the list corruption still exist, thanks. > > > > > > > Sorry, please ignore it... My system corrupted before I went to leave .... > > > > Please continue the bisection in that case and establish for sure if the > problem is in that series or not. Thanks. The above fix should not affect anything since a per cpu slab is not on any partial lists. And since there are no objects remaining in the slab there is then also no point of putting it back. It wont be on any lists before and after the action so no list processing is needed. Hmmm.... There maybe a race with slab_free from a remote processor. I dont see any problem here since we convert the page from frozen to nonfrozen in __slab_alloc and __slab_free will ignore the partial list management if it sees it to be frozen. Maybe we need some memory barriers here. Right now we are relying on the cmpxchg_double for sync of the state in the page struct but we also need the c->page variable to be consistent with that state. But we disable interrupts in __slab_alloc so there are no races possible with slab_free only with remote __slab_free invocations which will not touch c->page.