On 12/17/19 1:55 PM, Davidlohr Bueso wrote: > On Tue, 17 Dec 2019, Waiman Long wrote: >> Both the hugetbl_lock and the subpool lock can be acquired in >> free_huge_page(). One way to solve the problem is to make both locks >> irq-safe. However, Mike Kravetz had learned that the hugetlb_lock is >> held for a linear scan of ALL hugetlb pages during a cgroup reparentling >> operation. So it is just too long to have irq disabled unless we can >> break hugetbl_lock down into finer-grained locks with shorter lock >> hold times. >> >> Another alternative is to defer the freeing to a workqueue job. This >> patch implements the deferred freeing by adding a free_hpage_workfn() >> work function to do the actual freeing. The free_huge_page() call in >> a non-task context saves the page to be freed in the hpage_freelist >> linked list in a lockless manner using the llist APIs. >> >> The generic workqueue is used to process the work, but a dedicated >> workqueue can be used instead if it is desirable to have the huge page >> freed ASAP. >> >> Thanks to Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> for suggesting the use >> of llist APIs which simplfy the code. >> >> [v2: Add more comment & remove unneeded racing check] >> [v3: Update commit log, remove pr_debug & use llist APIs] > > Very creative reusing the mapping pointer, along with the llist api, > this solves the problem nicely (temporarily at least). > > Two small nits below. > > Acked-by: Davidlohr Bueso <dbueso@xxxxxxx> > >> Reported-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> >> Signed-off-by: Waiman Long <longman@xxxxxxxxxx> >> --- >> mm/hugetlb.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 50 insertions(+), 1 deletion(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> +static LLIST_HEAD(hpage_freelist); >> + >> +static void free_hpage_workfn(struct work_struct *work) >> +{ >> + struct llist_node *node; >> + struct page *page; >> + >> + node = llist_del_all(&hpage_freelist); >> + >> + while (node) { >> + page = container_of((struct address_space **)node, >> + struct page, mapping); >> + node = node->next; > > llist_next() I could use this helper, but the statement is simple enough to understand. > >> + __free_huge_page(page); >> + } >> +} >> +static DECLARE_WORK(free_hpage_work, free_hpage_workfn); >> + >> +void free_huge_page(struct page *page) >> +{ >> + /* >> + * Defer freeing if in non-task context to avoid hugetlb_lock >> deadlock. >> + */ >> + if (!in_task()) { > > unlikely()? Yes, I could use that too. For now, I am not going to post a v4 with these changes unless that are other substantial changes that require a respin. Thanks, Longman