On 2/13/24 10:21, Marco Elver wrote: > On Tue, 13 Feb 2024 at 10:16, Vlastimil Babka <vbabka@xxxxxxx> wrote: >> >> On 2/12/24 23:30, Oscar Salvador wrote: >> > page_owner needs to increment a stack_record refcount when a new allocation >> > occurs, and decrement it on a free operation. >> > In order to do that, we need to have a way to get a stack_record from a >> > handle. >> > Implement __stack_depot_get_stack_record() which just does that, and make >> > it public so page_owner can use it. >> > >> > Also implement {inc,dec}_stack_record_count() which increments >> > or decrements on respective allocation and free operations, via >> > __reset_page_owner() (free operation) and __set_page_owner() (alloc >> > operation). >> > >> > Traversing all stackdepot buckets comes with its own complexity, >> > plus we would have to implement a way to mark only those stack_records >> > that were originated from page_owner, as those are the ones we are >> > interested in. >> > For that reason, page_owner maintains its own list of stack_records, >> > because traversing that list is faster than traversing all buckets >> > while keeping at the same time a low complexity. >> > inc_stack_record_count() is responsible of adding new stack_records >> > into the list stack_list. >> > >> > Modifications on the list are protected via a spinlock with irqs >> > disabled, since this code can also be reached from IRQ context. >> > >> > Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> >> > --- >> > include/linux/stackdepot.h | 9 +++++ >> > lib/stackdepot.c | 8 +++++ >> > mm/page_owner.c | 73 ++++++++++++++++++++++++++++++++++++++ >> > 3 files changed, 90 insertions(+) >> >> ... >> >> >> > --- a/mm/page_owner.c >> > +++ b/mm/page_owner.c >> > @@ -36,6 +36,14 @@ struct page_owner { >> > pid_t free_tgid; >> > }; >> > >> > +struct stack { >> > + struct stack_record *stack_record; >> > + struct stack *next; >> > +}; >> > + >> > +static struct stack *stack_list; >> > +static DEFINE_SPINLOCK(stack_list_lock); >> > + >> > static bool page_owner_enabled __initdata; >> > DEFINE_STATIC_KEY_FALSE(page_owner_inited); >> > >> > @@ -61,6 +69,57 @@ static __init bool need_page_owner(void) >> > return page_owner_enabled; >> > } >> > >> > +static void add_stack_record_to_list(struct stack_record *stack_record) >> > +{ >> > + unsigned long flags; >> > + struct stack *stack; >> > + >> > + stack = kmalloc(sizeof(*stack), GFP_KERNEL); >> >> I doubt you can use GFP_KERNEL unconditionally? Think you need to pass down >> the gfp flags from __set_page_owner() here? >> And what about the alloc failure case, this will just leave the stack record >> unlinked forever? Can we somehow know which ones we failed to link, and try >> next time? Probably easier by not recording the stack for the page at all in >> that case, so the next attempt with the same stack looks like the very first >> again and attemps the add to list. >> Still not happy that these extra tracking objects are needed, but I guess >> the per-users stack depot instances I suggested would be a major change. >> >> > + if (stack) { >> > + stack->stack_record = stack_record; >> > + stack->next = NULL; >> > + >> > + spin_lock_irqsave(&stack_list_lock, flags); >> > + if (!stack_list) { >> > + stack_list = stack; >> > + } else { >> > + stack->next = stack_list; >> > + stack_list = stack; >> > + } >> > + spin_unlock_irqrestore(&stack_list_lock, flags); >> > + } >> > +} >> > + >> > +static void inc_stack_record_count(depot_stack_handle_t handle) >> > +{ >> > + struct stack_record *stack_record = __stack_depot_get_stack_record(handle); >> > + >> > + if (stack_record) { >> > + /* >> > + * New stack_record's that do not use STACK_DEPOT_FLAG_GET start >> > + * with REFCOUNT_SATURATED to catch spurious increments of their >> > + * refcount. >> > + * Since we do not use STACK_DEPOT_FLAG_{GET,PUT} API, let us >> > + * set a refcount of 1 ourselves. >> > + */ >> > + if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) { >> > + refcount_set(&stack_record->count, 1); >> >> Isn't this racy? Shouldn't we use some atomic cmpxchg operation to change >> from REFCOUNT_SATURATED to 1? > > If 2 threads race here, both will want to add it to the list as well > and take the lock. So this could just be solved with double-checked > locking: > > if (count == REFCOUNT_SATURATED) { > spin_lock(...); Yeah probably stack_list_lock could be taken here already. But then the kmalloc() of struct stack must happen also here, before taking the lock. > if (count == REFCOUNT_SATURATED) { > refcount_set(.., 1); > .. add to list ... > } > spin_unlock(..); > }