[Crash-utility] crash: struct command can read irrelevant pages.

anderson@xxxxxxxxxx (Dave Anderson) · Thu, 20 Feb 2014 15:45:50 -0500 (EST)

Hello Atsushi,

I've committed a SLAB/SLUB kmem_cache-specific fix for this issue:

  https://github.com/crash-utility/crash/commit/c0b7a74fc13121203810d06d163550436b2d5476

which is queued for crash-7.0.6.

Thanks,
  Dave

----- Original Message -----
> 
> 
> ----- Original Message -----
> > Hello,
> > 
> > Finally, I've found the cause of the issue I mentioned as below
> > when makedumpfile v1.5.5 was released:
> > 
> > > 2. At first, the supported kernel will be updated to 3.12, but I
> > > found an issue while testing for v1.5.5, which seems that the page
> > > filtering works wrongly on kernel 3.12. I couldn't investigate this
> > > yet and it will take some time to finish it.
> > > Therefore, the latest supported kernel version is 3.11 in v1.5.5.
> > 
> > This is neither a kernel issue nor a makedumpfile issue, it's a crash's bug.
> > It can happen when a slab cache is stored at almost end of a page.
> > 
> > == Description ==
> > 
> > At the beginning, I found the error message below when I used crash for
> > a dumpfile generated by makedumpfile -d2:
> > 
> >     please wait... (gathering kmem slab cache data)
> >     crash: page excluded: kernel virtual address: f4e87000  type:
> >     "kmem_cache
> >     buffer"
> > 
> >     crash: unable to initialize kmem slab cache subsystem
> > 
> > This message indicated that crash failed to get a slab cache during
> > kmem_cache_init(), and according to the below, crash failed to get
> > the slab cache stored at f4e86f40:
> > 
> >     crash> p kmem_cache
> >     kmem_cache = $1 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
> >     crash>
> >     crash> list kmem_cache.list -s kmem_cache.name -h 0xc0b1cbc0
> >     ...
> >     f4d37840
> >       name = 0xf4edf540 "uid_cache"
> >     f4e86f40
> >     list: page excluded: kernel virtual address: f4e87000  type:
> >     "gdb_readmem_callback"
> > 
> > It seems that the slab cache covered two pages, [f4e86000- f4e87000] and
> > [f4e87000- f4e88000]. Well, let's confirm the *real* size of it.
> > 
> > Since slab caches except kmem_cache_boot are allocated as slab objects,
> > we can confirm the size like below:
> > 
> >   crash> p kmem_cache
> >   kmem_cache = $2 = (struct kmem_cache *) 0xc0b1cbc0 <kmem_cache_boot>
> >   crash> struct kmem_cache.object_size 0xc0b1cbc0
> >     object_size = 104
> >   crash>
> > 
> > In my environment, the size was 104 bytes. Therefore, the slab cache
> > stored at f4e86f40 fits in the single page([f4e86000- f4e87000]) and
> > the excluded page([f4e87000- f4e88000]) isn't a related page.
> > 
> > On the other hand, crash get the size from vmlinux by using gdb,
> > it was 216 bytes:
> > 
> >     crash> struct kmem_cache
> >     struct kmem_cache {
> >         unsigned int batchcount;
> >         unsigned int limit;
> >         ...
> >         struct kmem_cache_node **node;
> >         struct array_cache *array[33];
> >     }
> >     SIZE: 216
> >     crash>
> > 
> > So crash mistook the correlative pages of the slab cache as
> > [f4e86000- f4e87000] and [f4e87000- f4e88000] even though the latter
> > was a irrelevant page.
> > 
> > This gap came from the fact that the size of slab cache is variable.
> > 
> >     struct kmem_cache {
> >     ...
> >             struct kmem_cache_node **node;
> >             struct array_cache *array[NR_CPUS + MAX_NUMNODES];
> >             /*
> >              * Do not add fields after array[]
> >              */
> >     };
> > 
> > The size of "array" is the variable factor of kmem_cache.
> > When building vmlinux, the size of kmem_cache will be calculated with
> > NR_CPUS and MAX_NUMNODES, and put it into vmlinux as a debug information.
> > (Sorry, I don't know gcc well. I may misunderstand this.)
> > However, the actual size will be smaller than the defined size because
> > the actual size will be decided based on the actual number of CPUs and
> > NODEs.
> > 
> > void __init kmem_cache_init(void)::
> > ...
> >         /*
> >          * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
> >          */
> >         create_boot_cache(kmem_cache, "kmem_cache",
> >                 offsetof(struct kmem_cache, array[nr_cpu_ids]) +
> >                                   nr_node_ids * sizeof(struct
> >                                   kmem_cache_node
> >                                   *),  // object_size
> >                                   SLAB_HWCACHE_ALIGN);
> >         list_add(&kmem_cache->list, &slab_caches);
> > 
> > 
> > As for kmem_cache, we can get the actual size of it from kmem_cache_boot,
> > but I suppose that kmem_cache is not the only struct in kernel whose size
> > is variable. So I think we should discuss how to address such issues like
> > this.
> > 
> > By the way, I mentioned the case of *SLAB* in this mail,
> > but SLUB seems have the same issue.
> > 
> > 
> > Thanks
> > Atsushi Kumagai
> 
> 
> This is a "known" issue has been discussed on the crash-utility list in the
> past,
> at least with respect to the kmem_cache data structure.  But for any random
> data
> structure that has such a construct, I'm not sure what can be done.
> 
> In the case of the CONFIG_SLAB kmem_cache data structure, there is a function
> that is supposed to "downsize" the size value of the kmem_cache data
> structure
> that is returned by gdb.  It is called here in kmem_cache_init(), just
> prior to cycling through all of the kmem_cache structures, where the
> page excluded error shown above occurred:
> 
>    8561         if (!(pc->flags & RUNTIME))
>    8562                 kmem_cache_downsize();
>    8563
>    8564         cache_buf = GETBUF(SIZE(kmem_cache_s));
>    8565         hq_open();
>    8566
>    8567         do {
>    8568                 cache_count++;
>    8569
>    8570                 if (!readmem(cache, KVADDR, cache_buf,
>    SIZE(kmem_cache_s),
>    8571                         "kmem_cache buffer", RETURN_ON_ERROR)) {
>    8572                         FREEBUF(cache_buf);
>    8573                         vt->flags |= KMEM_CACHE_UNAVAIL;
>    8574                         error(INFO,
>    8575                           "%sunable to initialize kmem slab cache
>    subsystem\n\n",
>    8576                                 DUMPFILE() ? "\n" : "");
>    8577                         hq_close();
>    8578                         return;
>    8579                 }
> 
> The SIZE(kmem_cache_s) value should have been downsized by that function,
> but presumably it did not work.  If CRASHDEBUG(1) was turned on during
> initialization,
> you would have seen either of these two messages from kmem_cache_downsize():
>  
>                 if (CRASHDEBUG(1))
>                         fprintf(fp, "kmem_cache_downsize: %ld to %ld\n",
>                                 STRUCT_SIZE("kmem_cache"),
>                                 SIZE(kmem_cache_s));
> 
> or:
> 
>                 if (CRASHDEBUG(1)) {
>                         fprintf(fp,
>                             "\nkmem_cache_downsize: SIZE(kmem_cache_s): %ld "
>                             "cache_cache.buffer_size: %d\n",
>                                 STRUCT_SIZE("kmem_cache"), buffer_size);
>                         fprintf(fp,
>                             "kmem_cache_downsize: nr_node_ids: %ld\n",
>                                 vt->kmem_cache_len_nodes);
>                 }
> 
> The function failed probably failed due to some kernel change.  In fact,
> I just checked a 3.13 CONFIG_SLAB kernel, and I see that
> kmem_cache_downsize()
> no longer works for that kernel.
> 
> I see that kmem_cache_boot would be a good alternative for determining
> the size on CONFIG_SLAB kernels, at least on 3.7 and later kernels where
> it was introduced.  And for CONFIG_SLUB, which doesn't currently have a
> "downsize" function, it looks like its "kmem_cache" cache also has size
> fields that could be used.
> 
> By any chance can you make the 32-bit vmlinux/vmcore pair available for
> me to download?  Reply to me off-list if you can.
> 
> Thanks,
>   Dave
> 
> 
> 
>  
> 
> 
> 
> 
> 
> 
>