On Tue, Jun 06, 2023 at 03:03:31PM -0400, Peter Xu wrote: > On Tue, Jun 06, 2023 at 03:23:30PM -0300, Jason Gunthorpe wrote: > > On Mon, Jun 05, 2023 at 08:40:01PM -0700, Hugh Dickins wrote: > > > > > diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c > > > index 20652daa1d7e..e4f58c5fc2ac 100644 > > > --- a/arch/powerpc/mm/pgtable-frag.c > > > +++ b/arch/powerpc/mm/pgtable-frag.c > > > @@ -120,3 +120,54 @@ void pte_fragment_free(unsigned long *table, int kernel) > > > __free_page(page); > > > } > > > } > > > + > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > > > +#define PTE_FREE_DEFERRED 0x10000 /* beyond any PTE_FRAG_NR */ > > > + > > > +static void pte_free_now(struct rcu_head *head) > > > +{ > > > + struct page *page; > > > + int refcount; > > > + > > > + page = container_of(head, struct page, rcu_head); > > > + refcount = atomic_sub_return(PTE_FREE_DEFERRED - 1, > > > + &page->pt_frag_refcount); > > > + if (refcount < PTE_FREE_DEFERRED) { > > > + pte_fragment_free((unsigned long *)page_address(page), 0); > > > + return; > > > + } > > > > From what I can tell power doesn't recycle the sub fragment into any > > kind of free list. It just waits for the last fragment to be unused > > and then frees the whole page. > > > > So why not simply go into pte_fragment_free() and do the call_rcu directly: > > > > BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0); > > if (atomic_dec_and_test(&page->pt_frag_refcount)) { > > if (!kernel) > > pgtable_pte_page_dtor(page); > > call_rcu(&page->rcu_head, free_page_rcu) > > We need to be careful on the lock being freed in pgtable_pte_page_dtor(), > in Hugh's series IIUC we need the spinlock being there for the rcu section > alongside the page itself. So even if to do so we'll need to also rcu call > pgtable_pte_page_dtor() when needed. Er yes, I botched that, the dtor and the free_page should be in a the rcu callback function Jason