On 27/04/2023 22.53, Toke Høiland-Jørgensen wrote:
@@ -490,11 +508,15 @@ void page_pool_release_page(struct page_pool *pool, struct page *page)
skip_dma_unmap:
page_pool_clear_pp_info(page);
- /* This may be the last page returned, releasing the pool, so
- * it is not safe to reference pool afterwards.
- */
- count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
- trace_page_pool_state_release(pool, page, count);
+ if (flags & PP_FLAG_SHUTDOWN)
+ hold_cnt = pp_read_hold_cnt(pool);
+
+ release_cnt = atomic_inc_return(&pool->pages_state_release_cnt);
+ trace_page_pool_state_release(pool, page, release_cnt);
+
+ /* In shutdown phase, last page will free pool instance */
+ if (flags & PP_FLAG_SHUTDOWN)
+ page_pool_free_attempt(pool, hold_cnt, release_cnt);
>
Since the assumption is that no new pages will be allocated once the
PP_FLAG_SHUTDOWN is set (i.e., hold_count can not increase in the case),
I don't think it matters what order you read the hold and release counts
in? So you could simplify the above to just:
+ if (flags & PP_FLAG_SHUTDOWN)
+ page_pool_free_attempt(pool, pp_read_hold_cnt(pool), release_cnt);
and drop the second check of the flag further up?
You could probably even lose the hold_cnt argument entirely from
page_pool_free_attempt() and just have it call pp_read_hold_cnt() directly?
I unfortunately think we have to keep this approach.
The purpose is to read out data from *pool, such that it is safe to call
page_pool_free_attempt() even when *pool memory have been freed.
I believe there is a race window between atomic_inc_return() and freeing
in page_pool_free_attempt(). (As we have tracepoints in this critical
section we might even be able to increase the chance of the race)
Imagine two CPUs freeing the last two PP pages.
Hold=2 which means when release_cnt reach 2 inflight is zero.
CPU-1 : release_cnt 1 = atomic_inc_return();
CPU-1 : gets preempted (or runs slow bpf-prog in tracepoint)
CPU-2 : release_cnt 2 = atomic_inc_return();
CPU-2 : page_pool_free_attempt(pool, 2, release_cnt=2);
CPU-2 : find no-inflight -> calls page_pool_free(pool)
CPU-1 : page_pool_free_attempt(pool, 2, release_cnt=1);
CPU-1 : *use-after-free* deref pool->pages_state_hold_cnt
}
EXPORT_SYMBOL(page_pool_release_page);