----- Original Message ----- > More testing revealed a machine in our stable that either failed to > initialize kmem: > > please wait... (gathering kmem slab cache data) > crash-6.0.3: page excluded: kernel virtual address: ffff8801263d6000 > type: "kmem_cache buffer" > > crash-6.0.3: unable to initialize kmem slab cache subsystem > > Or succeeded on initialize and then failed on a kmem -s command: > > crash-6.0.3> kmem -s > CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE > Segmentation fault > > > The problem is that the array struct at the end of kmem_cache remains declared as > 32 elements, but for all dynamically allocated copies, is actually trimmed down > to nr_cpu_ids in length. > > crash-6.0.3.best> struct kmem_cache > struct kmem_cache { > unsigned int batchcount; > ... > > struct list_head next; > struct kmem_list3 **nodelists; > struct array_cache *array[32]; > } > SIZE: 368 > > > On my normal play machine, nr_cpu_ids = 32 and actual cpus = 16. > > On the failing machine, nr_cpus_ids and actual cpus are both 2. > > Two problems occur: > > 1) max_cpudata_limit traverses the array until it finds a 0x0 or > reaches the real size. On the 2-cpu system, the "third" element in the > array belonged elsewhere, was non-zero, and pointed to data that caused > the apparent limit to be 0xffffffffffff8801, which didn't work well as > a length in a memcopy. But your patch does this: @@ -8117,8 +8135,9 @@ kmem_cache_s_array_nodes: "array cache array", RETURN_ON_ERROR)) goto bail_out; - for (i = max_limit = 0; (i < ARRAY_LENGTH(kmem_cache_s_array)) && - cpudata[i]; i++) { + for (i = max_limit = 0; (i < kmem_cache_nr_cpu) + && (i < ARRAY_LENGTH(kmem_cache_s_array)) + && cpudata[i]; i++) { if (!readmem(cpudata[i]+OFFSET(array_cache_limit), KVADDR, &limit, sizeof(int), "array cache limit", RETURN_ON_ERROR)) On "old" slab systems, your new "kmem_cache_nr_cpu" variable remains at its initialized value of zero, and the loop never gets entered. So I don't think you wanted to keep the (i < kmem_cache_nr_cpu) there, right? > 2) kmem_cache structs can be allocated near enough to the edge of a page > that the old incorrect length crosses the page boundary, even though the > real smaller structure fits in the page. That caused a readmem of the > structure to cross into a coincidentally missing page in the dump. Right -- that was the genesis of the kmem_cache_downsize() function. > This patch fixes both of those (after wrestling ARRAY_LENGTH to the > ground), but *does not* fix the similar page crossing problem when I try > to use a "struct kmem_cache" command on the particular structure at the > end of the page. Yeah, damn, I don't know what can be done for that, aside from some horrific kludge to gdb_readmem_callback() to return successfully even if the readmem() failed. Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility