----- "ville mattila" <ville.mattila@xxxxxxxxxxxxx> wrote: > Hello, > > We have a custom kernel based on 2.6.27.39. This kernel > has 2/2 memory split. Now we have one crash dump that can be > successfully be opened with crash 4.0-8.8 but not with crash 5.0. > This crashdump happens because double free of memory block, so there > might be some memory corruption in cache data area. > > Unfortunately I cannot pinpoint the exact version where this > starts to happen because I could not find older crash releases. > > Here is some debug info. > > The tail of crash -d 10 output > ... > NOTE: page_hash_table does not exist in this kernel > please wait... (gathering kmem slab cache data)<readmem: 8075801c, > KVADDR, > "cache_chain", 4, (FOE), ffb944f8> > addr: 8075801c paddr: 75801c cnt: 4 > GETBUF(128 -> 0) > FREEBUF(0) > GETBUF(204 -> 0) > <readmem: 8067f1c0, KVADDR, "kmem_cache buffer", 204, (FOE), 8520f00> > addr: 8067f1c0 paddr: 67f1c0 cnt: 204 > GETBUF(128 -> 1) > FREEBUF(1) > GETBUF(128 -> 1) > FREEBUF(1) > > kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0 > kmem_cache_downsize: nr_node_ids: 1 > FREEBUF(0) > > crash: zero-size memory allocation! (called from 80b7b7b) > > > addr2line -e crash 80b7b7b > /workarea/build/packages/crash/crash-5.0.0-32bit/memory.c:7439 > > I'm happy to test patches. Nice bug report! Here's what's happening: It's related to this patch that went into 4.1.0: - Fix for a potential failure to initialize the kmem slab cache subsystem on 2.6.22 and later CONFIG_SLAB kernels if the dumpfile has pages excluded by the makedumpfile facility. Without the patch, the following error message would be displayed during initialization: "crash: page excluded: kernel virtual address: <address> type: kmem_cache_s buffer", followed by "crash: unable to initialize kmem slab cache subsystem". (anderson@xxxxxxxxxx) The patch was put in place due to this definition of the kmem_cache data structure: struct kmem_cache { /* 1) per-cpu data, touched during every alloc/free */ struct array_cache *array[NR_CPUS]; /* 2) Cache tunables. Protected by cache_chain_mutex */ unsigned int batchcount; unsigned int limit; ... [ snip ] ... * We put nodelists[] at the end of kmem_cache, because we want to size * this array to nr_node_ids slots instead of MAX_NUMNODES * (see kmem_cache_init()) * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache * is statically defined, so we reserve the max number of nodes. */ struct kmem_list3 *nodelists[MAX_NUMNODES]; /* * Do not add fields after nodelists[] */ }; where for all kernel instances of the kmem_cache data structure *except* for the head "cache_cache" kmem_cache structure, every other kmem_cache structure in the kernel has its nodelists[] array downsized to whatever "nr_node_ids" is initialized to. The actual size of all of the downsized kmem_cache data structures can be found in the head "cache_cache.buffer_size" field. But when the crash utility queries gdb for the size of a kmem_cache structure it gets the "full" size as declared in the vmlinux debuginfo data. And so whenever a kmem_cache structure was read by crash, it was using the "full" size instead of the downsized size. Doing that type of over-sized read could potentially extend into the next page, and there was a reported case where doing that happened to extend into a page that was excluded by makedumpfile. Hence the kmem_cache_downsize() function added to memory.c. Anyway, given that your debug output shows: kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0 kmem_cache_downsize: nr_node_ids: 1 In vm_init() there was an initial STRUCT_SIZE_INIT(kmem_cache_s, ...) that set the size to 204 bytes. But then kmem_cache_downsize() was called to downsize to whatever cache_cache.buffer_size contains: ... buffer_size = UINT(cache_buf + MEMBER_OFFSET("kmem_cache", "buffer_size")); if (buffer_size < SIZE(kmem_cache_s)) { ASSIGN_SIZE(kmem_cache_s) = buffer_size; if (kernel_symbol_exists("nr_node_ids")) { get_symbol_data("nr_node_ids", sizeof(int), &nr_node_ids); vt->kmem_cache_len_nodes = nr_node_ids; } else vt->kmem_cache_len_nodes = 1; if (CRASHDEBUG(1)) { fprintf(fp, "\nkmem_cache_downsize: SIZE(kmem_cache_s): %ld " "cache_cache.buffer_size: %d\n", STRUCT_SIZE("kmem_cache"), buffer_size); fprintf(fp, "kmem_cache_downsize: nr_node_ids: %ld\n", vt->kmem_cache_len_nodes); } } But your kernel shows cache_cache.buffer_size set to zero -- and the ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure size from 204 to zero. Later on, that size was used to allocate a kmem_cache buffer, which failed when a GETBUF() was called with a zero-size. I guess a check could be made above for a zero cache_cache.buffer_size, but why would that ever be? Try this: # crash --no_kmem_cache vmlinux vmcore which will allow you to get past the kmem_cache initialization. Then enter: crash> p cache_cache Does the "buffer_size" member really show zero? BTW, you can work around the problem by commenting out the call to kmem_cache_downsize() in vm_init(). (And if you're using makedumpfile with excluded pages, hope that the problem I described above doesn't occur...) Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility