> void page_pool_destroy(struct page_pool *pool) > { > + unsigned int flags; > + u32 release_cnt; > + u32 hold_cnt; > + > if (!pool) > return; > > @@ -868,11 +894,45 @@ void page_pool_destroy(struct page_pool *pool) > if (!page_pool_release(pool)) > return; > > - pool->defer_start = jiffies; > - pool->defer_warn = jiffies + DEFER_WARN_INTERVAL; > + /* PP have pages inflight, thus cannot immediately release memory. > + * Enter into shutdown phase, depending on remaining in-flight PP > + * pages to trigger shutdown process (on concurrent CPUs) and last > + * page will free pool instance. > + * > + * There exist two race conditions here, we need to take into > + * account in the following code. > + * > + * 1. Before setting PP_FLAG_SHUTDOWN another CPU released the last > + * pages into the ptr_ring. Thus, it missed triggering shutdown > + * process, which can then be stalled forever. > + * > + * 2. After setting PP_FLAG_SHUTDOWN another CPU released the last > + * page, which triggered shutdown process and freed pool > + * instance. Thus, its not safe to dereference *pool afterwards. > + * > + * Handling races by holding a fake in-flight count, via artificially > + * bumping pages_state_hold_cnt, which assures pool isn't freed under > + * us. Use RCU Grace-Periods to guarantee concurrent CPUs will > + * transition safely into the shutdown phase. > + * > + * After safely transition into this state the races are resolved. For > + * race(1) its safe to recheck and empty ptr_ring (it will not free > + * pool). Race(2) cannot happen, and we can release fake in-flight count > + * as last step. > + */ > + hold_cnt = READ_ONCE(pool->pages_state_hold_cnt) + 1; > + WRITE_ONCE(pool->pages_state_hold_cnt, hold_cnt); > + synchronize_rcu(); > + > + flags = READ_ONCE(pool->p.flags) | PP_FLAG_SHUTDOWN; > + WRITE_ONCE(pool->p.flags, flags); > + synchronize_rcu(); Hmm, synchronize_rcu() can be quite expensive; why do we need two of them? Should be fine to just do one after those two writes, as long as the order of those writes is correct (which WRITE_ONCE should ensure)? Also, if we're adding this (blocking) operation in the teardown path we risk adding latency to that path (network interface removal, BPF_PROG_RUN syscall etc), so not sure if this actually ends up being an improvement anymore, as opposed to just keeping the workqueue but dropping the warning? -Toke