On Wed, Jul 29, 2015 at 03:28:03PM -0700, Andrew Morton wrote: > > That patch is a bit of a mess. Below is a cleaned up version. > > It appears that the regression you've identified was caused by > > commit e33660165c901d18e7d3df2290db070d3e4b46df > Author: Christoph Lameter <cl@xxxxxxxxx> > Date: Thu Jan 10 19:14:18 2013 +0000 > > slab: Use common kmalloc_index/kmalloc_size functions > > > Reviewers sought, please. > > > > From: Liuhailong <liu.hailong6@xxxxxxxxxx> > Subject: slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE + 1) > > Kernels after v3.9 use kmalloc_size(INDEX_NODE + 1) to get the next larger > cache size than the size index INDEX_NODE mapping. In kernels 3.9 and > earlier we used malloc_sizes[INDEX_L3 + 1].cs_size. > > However, sometimes we can't get the right output we expected via > kmalloc_size(INDEX_NODE + 1), causing a BUG(). > > The mapping table in the latest kernel is like: > index = {0, 1, 2 , 3, 4, 5, 6, n} > size = {0, 96, 192, 8, 16, 32, 64, 2^n} > The mapping table before 3.10 is like this: > index = {0 , 1 , 2, 3, 4 , 5 , 6, n} > size = {32, 64, 96, 128, 192, 256, 512, 2^(n+3)} > > The problem on my mips64 machine is as follows: > > (1) When configured DEBUG_SLAB && DEBUG_PAGEALLOC && DEBUG_LOCK_ALLOC > && DEBUG_SPINLOCK, the sizeof(struct kmem_cache_node) will be "150", > and the macro INDEX_NODE turns out to be "2": #define INDEX_NODE > kmalloc_index(sizeof(struct kmem_cache_node)) > > (2) Then the result of kmalloc_size(INDEX_NODE + 1) is 8. > > (3) Then "if(size >= kmalloc_size(INDEX_NODE + 1)" will lead to "size > = PAGE_SIZE". > > (4) Then "if ((size >= (PAGE_SIZE >> 3))" test will be satisfied and > "flags |= CFLGS_OFF_SLAB" will be covered. > > (5) if (flags & CFLGS_OFF_SLAB)" test will be satisfied and will go to > "cachep->slabp_cache = kmalloc_slab(slab_size, 0u)", and the result > here may be NULL while kernel bootup. > > (6) Finally,"BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));" causes the > BUG info as the following shows (may be only mips64 has this problem): > > #20 > task: ffffffffc072cdc0 ti: ffffffffc06b4000 task.ti: ffffffffc06b4000 > $ 0 : 0000000000000000 0000000000000018 0000000000000001 0000000100000fff > $ 4 : 0000000000000030 0000000000000000 0000000000001004 0000000000001000 > $ 8 : ffffffff80002800 000000000000000b 0000000000000000 0000000000000000 > $12 : 0000000080000000 0000000000000000 c0000000bf818ebc c0000000bf818eb8 > $16 : c0000000bf818ea0 0000000080002800 0000000000000000 0000000000001000 > $20 : 0000000000000034 0000000080000000 0000000000000000 0000000000000006 > $24 : ffffffffc1160000 00000000000003f4 > $28 : ffffffffc06b4000 ffffffffc06b7d40 0000000000002000 ffffffffc01d077c > Hi : 0000000000000fff > Lo : 0000000000100000 > epc : ffffffffc01d0784 __kmem_cache_create+0x2ac/0x530 > Not tainted > ra : ffffffffc01d077c __kmem_cache_create+0x2a4/0x530 > Status: 141000e2 KX SX UX KERNEL EXL > Cause : 40808034 > PrId : 000c1300 (Broadcom XLPII) > Process swapper (pid: 0, threadinfo=ffffffffc06b4000, task=ffffffffc072cdc > 0,tls=0000000000000000) > *HwTLS: fffffffffadebeef > Stack : 00000000c073b018 c0000000bf818ea0 0000000000000040 0000000000000000 > ffffffffc115b360 ffffffffc115b360 0000000000000017 0000000000000007 > 0000000000000006 ffffffffc0780f54 0000000000002000 0000000000000040 > c0000000bf818ea0 0000000000000000 0000000000000040 ffffffffc0780fec > 0000000000002000 c0000000bf810fc0 ffffffffc115b390 0000000000000006 > ffffffffc1160000 ffffffffc07810b0 0000000000000001 c0000000bf809000 > ffffffffc0a30000 ffffffffc07a0000 ffffffffc07a1758 ffffffffc0a30000 > ffffffff8c190000 ffffffff8bd02983 0000000000000000 ffffffff8c18f798 > 0000000000000000 ffffffffc07746e0 ffffffffc07a1758 0000000000000000 > 0000000000000000 ffffffff805c5580 0000000000000000 0000000000000000 > ... > Call Trace: > [<ffffffffc01d0784>] __kmem_cache_create+0x2ac/0x530 > [<ffffffffc0780f54>] create_boot_cache+0x54/0x90 > [<ffffffffc0780fec>] create_kmalloc_cache+0x5c/0x94 > [<ffffffffc07810b0>] create_kmalloc_caches+0x8c/0x1b0 > [<ffffffffc07746e0>] start_kernel+0x1a0/0x408 > > This patch fixes the problem of kmalloc_size(INDEX_NODE + 1) removes the > BUG. I tested it on my mips64 mechine. > > Signed-off-by: Liuhailong <liu.hailong6@xxxxxxxxxx> > Reviewed-by: Jianxuexin <jiang.xuexin@xxxxxxxxxx> > Cc: Christoph Lameter <cl@xxxxxxxxx> > Cc: Pekka Enberg <penberg@xxxxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > mm/slab.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff -puN mm/slab.c~slab-fix-the-unexpected-index-mapping-result-of-kmalloc_sizeindex_node-1 mm/slab.c > --- a/mm/slab.c~slab-fix-the-unexpected-index-mapping-result-of-kmalloc_sizeindex_node-1 > +++ a/mm/slab.c > @@ -2190,7 +2190,7 @@ __kmem_cache_create (struct kmem_cache * > size += BYTES_PER_WORD; > } > #if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC) > - if (size >= kmalloc_size(INDEX_NODE + 1) > + if (size >= kmalloc_size(INDEX_NODE) * 2 > && cachep->object_size > cache_line_size() > && ALIGN(size, cachep->align) < PAGE_SIZE) { > cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align); Hello, I don't think that this fix is right. Just "kmalloc_size(INDEX_NODE) * 2" looks insane because it means 192 * 2 = 384 on his platform. Why we need to check size is larger than 384? I'm wondering what's the meaning of this check "size >= kmalloc_size(INDEX_NODE + 1)". Requirement for activating debug_pagealloc may be off-slab freelist management. It can be possible after some of kmalloc cache is enabled for off-slab freelist management, so what we need to check here is somethinkg like following. - if (size >= kmalloc_size(INDEX_NODE + 1) + if (!slab_early_init && + size >= XXX In fact, we can activate debug_pagealloc for more less sized kmem_cache. Perhaps, size >= 64 is okay, too. But, it increases memory usage greatly so we need pre-determined value here. Before Christoph's commit e498be7daf in 2005, it is set to 128. So, I think that 128 or 256 is good choice here. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>