On Wed, 3 Dec 2003, Nathan Scott wrote: > > The XFS tests just tripped up a panic in raid5 in -test11 -- a kdb > stacktrace follows. Seems to be reproducible, but not always the > same test that causes it. And I haven't seen a double bio_put yet, > this first problem keeps getting in the way I guess. Ok, debugging this oops makes me _think_ that the problem comes from here: raid5.c: around line 1000: .... wbi = dev->written; dev->written = NULL; while (wbi && wbi->bi_sector < dev->sector + STRIPE_SECTORS) { wbi2 = wbi->bi_next; if (--wbi->bi_phys_segments == 0) { md_write_end(conf->mddev); wbi->bi_next = return_bi; return_bi = wbi; } wbi = wbi2; } .... where it appears that the "wbi->bi_sector" access takes a page fault, probably due to PAGE_ALLOC debugging. It appears that somebody has already finished (and thus free'd) that bio. I dunno - I can't follow what that code does at all. One problem is that the slab code - because it caches the slabs and shares pages between different slab entryes - will not trigger the bugs that DEBUG_PAGEALLOC would show very easily. So here's my ugly hack once more, to see if that makes the bug show up more repeatably and quicker. Nathan? Linus -+- slab-debug-on-steroids -+- NOTE! For this patch to make sense, you have to enable the page allocator debugging thing (CONFIG_DEBUG_PAGEALLOC), and you have to live with the fact that it wastes a _lot_ of memory. There's another problem with this patch: if the bug is actually in the slab code itself, this will obviously not find it, since it disables that code entirely. ===== mm/slab.c 1.110 vs edited ===== --- 1.110/mm/slab.c Tue Oct 21 22:10:10 2003 +++ edited/mm/slab.c Mon Dec 1 15:29:06 2003 @@ -1906,6 +1906,21 @@ static inline void * __cache_alloc (kmem_cache_t *cachep, int flags) { +#if 1 + void *ptr = (void*)__get_free_pages(flags, cachep->gfporder); + if (ptr) { + struct page *page = virt_to_page(ptr); + SET_PAGE_CACHE(page, cachep); + SET_PAGE_SLAB(page, 0x01020304); + if (cachep->ctor) { + unsigned long ctor_flags = SLAB_CTOR_CONSTRUCTOR; + if (!(flags & __GFP_WAIT)) + ctor_flags |= SLAB_CTOR_ATOMIC; + cachep->ctor(ptr, cachep, ctor_flags); + } + } + return ptr; +#else unsigned long save_flags; void* objp; struct array_cache *ac; @@ -1925,6 +1940,7 @@ local_irq_restore(save_flags); objp = cache_alloc_debugcheck_after(cachep, flags, objp, __builtin_return_address(0)); return objp; +#endif } /* @@ -2042,6 +2058,15 @@ */ static inline void __cache_free (kmem_cache_t *cachep, void* objp) { +#if 1 + { + struct page *page = virt_to_page(objp); + int order = cachep->gfporder; + if (cachep->dtor) + cachep->dtor(objp, cachep, 0); + __free_pages(page, order); + } +#else struct array_cache *ac = ac_data(cachep); check_irq_off(); @@ -2056,6 +2081,7 @@ cache_flusharray(cachep, ac); ac_entry(ac)[ac->avail++] = objp; } +#endif } /** - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html