Re: [PATCH RFC net-next/mm V2 1/2] page_pool: Remove workqueue in new shutdown scheme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 27/04/2023 22.53, Toke Høiland-Jørgensen wrote:
@@ -490,11 +508,15 @@ void page_pool_release_page(struct page_pool *pool, struct page *page)
  skip_dma_unmap:
  	page_pool_clear_pp_info(page);
- /* This may be the last page returned, releasing the pool, so
-	 * it is not safe to reference pool afterwards.
-	 */
-	count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
-	trace_page_pool_state_release(pool, page, count);
+	if (flags & PP_FLAG_SHUTDOWN)
+		hold_cnt = pp_read_hold_cnt(pool);
+
+	release_cnt = atomic_inc_return(&pool->pages_state_release_cnt);
+	trace_page_pool_state_release(pool, page, release_cnt);
+
+	/* In shutdown phase, last page will free pool instance */
+	if (flags & PP_FLAG_SHUTDOWN)
+		page_pool_free_attempt(pool, hold_cnt, release_cnt);
>
Since the assumption is that no new pages will be allocated once the
PP_FLAG_SHUTDOWN is set (i.e., hold_count can not increase in the case),
I don't think it matters what order you read the hold and release counts
in? So you could simplify the above to just:

+	if (flags & PP_FLAG_SHUTDOWN)
+		page_pool_free_attempt(pool, pp_read_hold_cnt(pool), release_cnt);
and drop the second check of the flag further up?

You could probably even lose the hold_cnt argument entirely from
page_pool_free_attempt() and just have it call pp_read_hold_cnt() directly?


I unfortunately think we have to keep this approach.

The purpose is to read out data from *pool, such that it is safe to call
page_pool_free_attempt() even when *pool memory have been freed.

I believe there is a race window between atomic_inc_return() and freeing
in page_pool_free_attempt(). (As we have tracepoints in this critical
section we might even be able to increase the chance of the race)

Imagine two CPUs freeing the last two PP pages.
Hold=2 which means when release_cnt reach 2 inflight is zero.

 CPU-1 : release_cnt 1 = atomic_inc_return();
 CPU-1 : gets preempted (or runs slow bpf-prog in tracepoint)
 CPU-2 : release_cnt 2 = atomic_inc_return();
 CPU-2 : page_pool_free_attempt(pool, 2, release_cnt=2);
 CPU-2 : find no-inflight -> calls page_pool_free(pool)
 CPU-1 : page_pool_free_attempt(pool, 2, release_cnt=1);
 CPU-1 : *use-after-free* deref pool->pages_state_hold_cnt


  }
  EXPORT_SYMBOL(page_pool_release_page);





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux