On Wed, Dec 09, 2020 at 06:28:50PM +0100, Vlastimil Babka wrote: > On 12/9/20 2:12 AM, paulmck@xxxxxxxxxx wrote: > > From: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > > > There are kernel facilities such as per-CPU reference counts that give > > error messages in generic handlers or callbacks, whose messages are > > unenlightening. In the case of per-CPU reference-count underflow, this > > is not a problem when creating a new use of this facility because in that > > case the bug is almost certainly in the code implementing that new use. > > However, trouble arises when deploying across many systems, which might > > exercise corner cases that were not seen during development and testing. > > Here, it would be really nice to get some kind of hint as to which of > > several uses the underflow was caused by. > > > > This commit therefore exposes a mem_dump_obj() function that takes > > a pointer to memory (which must still be allocated if it has been > > dynamically allocated) and prints available information on where that > > memory came from. This pointer can reference the middle of the block as > > well as the beginning of the block, as needed by things like RCU callback > > functions and timer handlers that might not know where the beginning of > > the memory block is. These functions and handlers can use mem_dump_obj() > > to print out better hints as to where the problem might lie. > > Sounds useful, yeah. It occured to me at least once that we don't have a nice > generic way to print this kind of info. I usually dig it from a crash dump... Glad to hear that it might be helpful, and thank you for looking this over! > > The information printed can depend on kernel configuration. For example, > > the allocation return address can be printed only for slab and slub, > > and even then only when the necessary debug has been enabled. For slab, > > build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space > > to the next power of two or use the SLAB_STORE_USER when creating the > > kmem_cache structure. For slub, build with CONFIG_SLUB_DEBUG=y and > > boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create() > > if more focused use is desired. Also for slub, use CONFIG_STACKTRACE > > to enable printing of the allocation-time stack trace. > > > > Cc: Christoph Lameter <cl@xxxxxxxxx> > > Cc: Pekka Enberg <penberg@xxxxxxxxxx> > > Cc: David Rientjes <rientjes@xxxxxxxxxx> > > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > Cc: <linux-mm@xxxxxxxxx> > > Reported-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > [ paulmck: Convert to printing and change names per Joonsoo Kim. ] > > [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ] > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > ... > > > +/** > > + * kmem_valid_obj - does the pointer reference a valid slab object? > > + * @object: pointer to query. > > + * > > + * Return: %true if the pointer is to a not-yet-freed object from > > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer > > + * is to an already-freed object, and %false otherwise. > > + */ > > It should be possible to find out more about object being free or not, than you > currently do. At least to find out if it's definitely free. When it appears > allocated, it can be actually still free in some kind of e.g. per-cpu or > per-node cache that would be infeasible to check. But that improvement to the > output can be also added later. Also SLUB stores the freeing stacktrace, which > might be useful... I can see how this could help debugging a use-after-free situation, at least as long as the poor sap that subsequently allocated it doesn't free it. I can easily add more fields to the kmem_provenance structure. Maybe it would make sense to have another exported API that you provide a kmem_provenance structure to, and it fills it in. One caution though... I rely on the object being allocated. If it officially might already be freed, complex and high-overhead synchronization seems to be required to safely access the various data structures. So any use on an already-freed object is on a "you break it you get to keep the pieces" basis. On the other hand, if you are dealing with a use-after-free situation, life is hard anyway. Or am I missing your point? > > +bool kmem_valid_obj(void *object) > > +{ > > + struct page *page; > > + > > + if (!virt_addr_valid(object)) > > + return false; > > + page = virt_to_head_page(object); > > + return PageSlab(page); > > +} > > +EXPORT_SYMBOL_GPL(kmem_valid_obj); > > + > > +/** > > + * kmem_dump_obj - Print available slab provenance information > > + * @object: slab object for which to find provenance information. > > + * > > + * This function uses pr_cont(), so that the caller is expected to have > > + * printed out whatever preamble is appropriate. The provenance information > > + * depends on the type of object and on how much debugging is enabled. > > + * For a slab-cache object, the fact that it is a slab object is printed, > > + * and, if available, the slab name, return address, and stack trace from > > + * the allocation of that object. > > + * > > + * This function will splat if passed a pointer to a non-slab object. > > + * If you are not sure what type of object you have, you should instead > > + * use mem_dump_obj(). > > + */ > > +void kmem_dump_obj(void *object) > > +{ > > + int i; > > + struct page *page; > > + struct kmem_provenance kp; > > + > > + if (WARN_ON_ONCE(!virt_addr_valid(object))) > > + return; > > + page = virt_to_head_page(object); > > + if (WARN_ON_ONCE(!PageSlab(page))) { > > + pr_cont(" non-slab memory.\n"); > > + return; > > + } > > + kp.kp_ptr = object; > > + kp.kp_page = page; > > + kp.kp_nstack = KS_ADDRS_COUNT; > > + kmem_provenance(&kp); > > You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but > would make sense in this patch already). Good point! However, please note that the various debugging options that reserve space at the beginning. This can make the meaning of kp.kp_objp a bit different than one might expect. > > + if (page->slab_cache) > > + pr_cont(" slab %s", page->slab_cache->name); > > + else > > + pr_cont(" slab "); > > + if (kp.kp_ret) > > + pr_cont(" allocated at %pS\n", kp.kp_ret); > > + else > > + pr_cont("\n"); > > + if (kp.kp_stack[0]) { > > + for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) { > > + if (!kp.kp_stack[i]) > > + break; > > + pr_info(" %pS\n", kp.kp_stack[i]); > > + } > > + } > > +} > > ... > > > diff --git a/mm/slub.c b/mm/slub.c > > index b30be23..027fe0f 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -3918,6 +3918,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s) > > return 0; > > } > > > > +void kmem_provenance(struct kmem_provenance *kpp) > > +{ > > +#ifdef CONFIG_SLUB_DEBUG > > I'd expect at least the very basic stuff (kp_obj) to be possible to determine > even under !CONFIG_SLUB_DEBUG? And doing it that way even saves a line of code! ;-) > > + void *base; > > + int i; > > + void *object = kpp->kp_ptr; > > + unsigned int objnr; > > + void *objp; > > + struct page *page = kpp->kp_page; > > + struct kmem_cache *s = page->slab_cache; > > + struct track *trackp; > > + > > + base = page_address(page); > > + objp = kasan_reset_tag(object); > > + objp = restore_red_left(s, objp); > > + objnr = obj_to_index(s, page, objp); > > + objp = base + s->size * objnr; > > + kpp->kp_objp = objp; > > + if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) || > > + !(s->flags & SLAB_STORE_USER)) > > + goto nodebug; > > + trackp = get_track(s, objp, TRACK_ALLOC); > > + kpp->kp_ret = (void *)trackp->addr; > > +#ifdef CONFIG_STACKTRACE > > + for (i = 0; i < kpp->kp_nstack && i < TRACK_ADDRS_COUNT; i++) { > > + kpp->kp_stack[i] = (void *)trackp->addrs[i]; > > + if (!kpp->kp_stack[i]) > > + break; > > + } > > +#endif > > + if (kpp->kp_stack && i < kpp->kp_nstack) > > + kpp->kp_stack[i] = NULL; > > + return; > > +nodebug: > > +#endif > > + kpp->kp_ret = NULL; > > + if (kpp->kp_nstack) > > + kpp->kp_stack[0] = NULL; > > +} > > + > > /******************************************************************** > > * Kmalloc subsystem > > *******************************************************************/ > > diff --git a/mm/util.c b/mm/util.c > > index 4ddb6e1..d0e60d2 100644 > > --- a/mm/util.c > > +++ b/mm/util.c > > I think mm/debug.c is a better fit as it already has dump_page() of a similar > nature. Also you can call that from from mem_dump_obj() at least in case when > the more specific handlers fail. It will even include page_owner info if enabled! :) I will count this as one vote for mm/debug.c. Two things to consider, though... First, Joonsoo suggests that the fact that this produces useful information without any debugging information enabled makes it not be debugging as such. Second, mm/debug.c does not include either slab.h or vmalloc.h. The second might not be a showstopper, but I was interpreting this to mean that its role was less central. Thanx, Paul > Thanks, > Vlastimil > > > @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2) > > kunmap_atomic(addr1); > > return ret; > > } > > + > > +/** > > + * mem_dump_obj - Print available provenance information > > + * @object: object for which to find provenance information. > > + * > > + * This function uses pr_cont(), so that the caller is expected to have > > + * printed out whatever preamble is appropriate. The provenance information > > + * depends on the type of object and on how much debugging is enabled. > > + * For example, for a slab-cache object, the slab name is printed, and, > > + * if available, the return address and stack trace from the allocation > > + * of that object. > > + */ > > +void mem_dump_obj(void *object) > > +{ > > + if (!virt_addr_valid(object)) { > > + pr_cont(" non-paged (local) memory.\n"); > > + return; > > + } > > + if (kmem_valid_obj(object)) { > > + kmem_dump_obj(object); > > + return; > > + } > > + pr_cont(" non-slab memory.\n"); > > +} > > +EXPORT_SYMBOL_GPL(mem_dump_obj); > > >