On Sat, Apr 3, 2021 at 4:56 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > On 4/2/21 5:47 AM, Muchun Song wrote: > > On Wed, Mar 31, 2021 at 11:42 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > >> > >> Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in > >> non-task context") was added to address the issue of free_huge_page > >> being called from irq context. That commit hands off free_huge_page > >> processing to a workqueue if !in_task. However, this doesn't cover > >> all the cases as pointed out by 0day bot lockdep report [1]. > >> > >> : Possible interrupt unsafe locking scenario: > >> : > >> : CPU0 CPU1 > >> : ---- ---- > >> : lock(hugetlb_lock); > >> : local_irq_disable(); > >> : lock(slock-AF_INET); > >> : lock(hugetlb_lock); > >> : <Interrupt> > >> : lock(slock-AF_INET); > >> > >> Shakeel has later explained that this is very likely TCP TX zerocopy > >> from hugetlb pages scenario when the networking code drops a last > >> reference to hugetlb page while having IRQ disabled. Hugetlb freeing > >> path doesn't disable IRQ while holding hugetlb_lock so a lock dependency > >> chain can lead to a deadlock. > >> > >> This commit addresses the issue by doing the following: > >> - Make hugetlb_lock irq safe. This is mostly a simple process of > >> changing spin_*lock calls to spin_*lock_irq* calls. > >> - Make subpool lock irq safe in a similar manner. > >> - Revert the !in_task check and workqueue handoff. > >> > >> [1] https://lore.kernel.org/linux-mm/000000000000f1c03b05bc43aadc@xxxxxxxxxx/ > >> > >> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > >> Acked-by: Michal Hocko <mhocko@xxxxxxxx> > >> Reviewed-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> > > > > Hi Mike, > > > > Today I pulled the newest code (next-20210401). I found that > > alloc_and_dissolve_huge_page is not updated. In this function, > > hugetlb_lock is still non-irq safe. Maybe you or Oscar need > > to fix. > > > > Thanks. > > Thank you Muchun, > > Oscar's changes were not in Andrew's tree when I started on this series > and I failed to notice their inclusion. In addition, > isolate_or_dissolve_huge_page also needs updating as well as a change in > set_max_huge_pages that was omitted while rebasing. > > Andrew, the following patch addresses those missing changes. Ideally, > the changes should be combined/included in this patch. If you want me > to sent another version of this patch or another series, let me know. > > From 450593eb3cea895f499ddc343c22424c552ea502 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Date: Fri, 2 Apr 2021 13:18:13 -0700 > Subject: [PATCH] hugetlb: fix irq locking omissions > > The pach "hugetlb: make free_huge_page irq safe" changed spin_*lock > calls to spin_*lock_irq* calls. However, it missed several places > in the file hugetlb.c. Add the overlooked changes. > > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Thanks MIke. But there are still some places that need improvement. See below. > --- > mm/hugetlb.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c22111f3da20..a6bfc6bcbc81 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2284,7 +2284,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > */ > page_ref_dec(new_page); > retry: > - spin_lock(&hugetlb_lock); > + spin_lock_irq(&hugetlb_lock); > if (!PageHuge(old_page)) { > /* > * Freed from under us. Drop new_page too. > @@ -2297,7 +2297,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > * Fail with -EBUSY if not possible. > */ > update_and_free_page(h, new_page); Now update_and_free_page can be called without holding hugetlb_lock. We can move it out of hugetlb_lock. In this function, there are 3 places which call update_and_free_page(). We can move all of them out of hugetlb_lock. Right? This optimization can squash to the commit: "hugetlb: call update_and_free_page without hugetlb_lock" Thanks. > - spin_unlock(&hugetlb_lock); > + spin_unlock_irq(&hugetlb_lock); > if (!isolate_huge_page(old_page, list)) > ret = -EBUSY; > return ret; > @@ -2307,7 +2307,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > * freelist yet. Race window is small, so we can succed here if > * we retry. > */ > - spin_unlock(&hugetlb_lock); > + spin_unlock_irq(&hugetlb_lock); > cond_resched(); > goto retry; > } else { > @@ -2323,7 +2323,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > __enqueue_huge_page(&h->hugepage_freelists[nid], new_page); > } > unlock: > - spin_unlock(&hugetlb_lock); > + spin_unlock_irq(&hugetlb_lock); > > return ret; > } > @@ -2339,15 +2339,15 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) > * to carefully check the state under the lock. > * Return success when racing as if we dissolved the page ourselves. > */ > - spin_lock(&hugetlb_lock); > + spin_lock_irq(&hugetlb_lock); > if (PageHuge(page)) { > head = compound_head(page); > h = page_hstate(head); > } else { > - spin_unlock(&hugetlb_lock); > + spin_unlock_irq(&hugetlb_lock); > return 0; > } > - spin_unlock(&hugetlb_lock); > + spin_unlock_irq(&hugetlb_lock); > > /* > * Fence off gigantic pages as there is a cyclic dependency between > @@ -2737,7 +2737,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, > * pages in hstate via the proc/sysfs interfaces. > */ > mutex_lock(&h->resize_lock); > - spin_lock(&hugetlb_lock); > + spin_lock_irq(&hugetlb_lock); > > /* > * Check for a node specific request. > -- > 2.30.2 >