Re: [PATCH v7 3/4] mm,page_owner: Display all stacks and their count

Oscar Salvador <osalvador@xxxxxxx> · Fri, 9 Feb 2024 22:52:48 +0100

On Fri, Feb 09, 2024 at 09:00:40AM +0100, Marco Elver wrote:
> > +/**
> > + * stack_depot_get_next_stack - Returns all stacks, one at a time
> 
> "Returns all stack_records" to be clear that this is returning the struct.

Fixed.

> 
> > + *
> > + * @table:     Current table we are checking
> > + * @bucket:    Current bucket we are checking
> > + * @last_found:        Last stack that was found
> > + *
> > + * This function finds first a non-empty bucket and returns the first stack
> > + * stored in it. On consequent calls, it walks the bucket to see whether
> > + * it contains more stacks.
> > + * Once we have walked all the stacks in a bucket, we check
> > + * the next one, and we repeat the same steps until we have checked all of them
> 
> I think for this function it's important to say that no entry returned
> from this function can be evicted.
> 
> I.e. the easiest way to ensure this is that the caller makes sure the
> entries returned are never passed to stack_depot_put() - which is
> certainly the case for your usecase because you do not use
> stack_depot_put().
> 
> > + * Return: A pointer a to stack_record struct, or NULL when we have walked all
> > + * buckets.
> > + */
> > +struct stack_record *stack_depot_get_next_stack(unsigned long *table,
> 
> To keep consistent, I'd also call this
> __stack_depot_get_next_stack_record(), so that we're clear this is
> more of an internal function not for general usage.
> 
> > +                                               struct list_head **bucket,
> > +                                               struct stack_record **last_found);
> > +
> >  /**
> >   * stack_depot_fetch - Fetch a stack trace from stack depot
> >   *
> > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > index 197c355601f9..107bd0174cd6 100644
> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -782,6 +782,52 @@ unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
> >  }
> >  EXPORT_SYMBOL(stack_depot_get_extra_bits);
> >
> > +struct stack_record *stack_depot_get_next_stack(unsigned long *table,
> > +                                               struct list_head **curr_bucket,
> > +                                               struct stack_record **last_found)
> > +{
> > +       struct list_head *bucket = *curr_bucket;
> > +       unsigned long nr_table = *table;
> > +       struct stack_record *found = NULL;
> > +       unsigned long stack_table_entries = stack_hash_mask + 1;
> > +
> > +       rcu_read_lock_sched_notrace();
> 
> We are returning pointers to stack_records out of the RCU-read
> critical section, which are then later used to continue the iteration.
> list_for_each_entry_continue_rcu() says this is fine if "... you held
> some sort of non-RCU reference (such as a reference count) ...".
> Updating the function's documentation to say none of these entries can
> be evicted via a stack_depot_put() is required.

Thinking about it some more, I think I made a mistake:

I am walking all buckets, and within those buckets there are not only
page_owner stack_records, which means that I could return a stack_record
from e.g: KASAN (which I think can evict stack_records) and then
everything goes off the rails.
Which means I cannot walk the buckets like that.

Actually, I think that having something like the following

 struct list_stack_records {
      struct stack_record *stack;
      struct list_stack_records *next;
 }

in page_owner would make sense.
Then the only thing I would have to do is to add a new record on every
new stack_record, and then I could just walk the list like a linked
list.

Which means that the function stack_depot_get_next_stack() could be
killed because everything would happen in page_owner code.

e.g:

 static void inc_stack_record_count(depot_stack_handle_t handle)
 {
         struct stack_record *stack = __stack_depot_get_stack_record(handle);

         if (stack) {
                 /*
                  * New stack_record's that do not use STACK_DEPOT_FLAG_GET start
                  * with REFCOUNT_SATURATED to catch spurious increments of their
                  * refcount.
                  * Since we do not use STACK_DEPOT_FLAG_{GET,PUT} API, let us
                  * set a refcount of 1 ourselves.
                  */
                 if (refcount_read(&stack->count) == REFCOUNT_SATURATED) {
                         refcount_set(&stack->count, 1);
			 add_new_stack_record_into_the_list(stack)
		 }
                 refcount_inc(&stack->count);
         }
 }

and then just walk the list_stack_records list whenever we want to
show the stacktraces and their counting.

I think that overall this approach is cleaner and safer.

-- 
Oscar Salvador
SUSE Labs