On 10/12/23 17:12, Mike Kravetz wrote: > On 10/12/23 07:53, Mike Kravetz wrote: > > On 10/11/23 17:03, Nathan Chancellor wrote: > > > On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote: > > > > On 10/09/23 15:56, Usama Arif wrote: > > > > Thank you Nathan! That is very helpful. > > > > I will use this information to try and recreate. If I can recreate, I > > should be able to get to root cause. > > I could easily recreate the issue using the provided instructions. First > thing I did was add a few printk's to check/verify state. The beginning > of gather_bootmem_prealloc looked like this: Hi Nathan, This is looking more and more like a Clang issue to me. I did a little more problem isolation today. Here is what I did: - Check out commit "hugetlb: restructure pool allocations" in linux-next - Fix the known issue with early disable/enable IRQs via locking by applying: commit 266789498210dff6cf9a14b64fa3a5cb2fcc5858 Author: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Date: Fri Oct 13 13:14:15 2023 -0700 fix prep_and_add_allocated_folios locking diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c843506654f8..d8ab2d9b391b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2246,15 +2246,16 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h, static void prep_and_add_allocated_folios(struct hstate *h, struct list_head *folio_list) { + unsigned long flags; struct folio *folio, *tmp_f; /* Add all new pool pages to free lists in one lock cycle */ - spin_lock_irq(&hugetlb_lock); + spin_lock_irqsave(&hugetlb_lock, flags); list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { __prep_account_new_huge_page(h, folio_nid(folio)); enqueue_hugetlb_folio(h, folio); } - spin_unlock_irq(&hugetlb_lock); + spin_unlock_irqrestore(&hugetlb_lock, flags); } /* - Add the following code which would only trigger a BUG if we were to traverse an empty list; which should NEVER happen. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d8ab2d9b391b..be234831b33f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3294,11 +3294,21 @@ static void __init gather_bootmem_prealloc(void) LIST_HEAD(folio_list); struct huge_bootmem_page *m; struct hstate *h, *prev_h = NULL; + bool empty; + + empty = list_empty(&huge_boot_pages); + if (empty) + printk("gather_bootmem_prealloc: huge_boot_pages list empty\n"); list_for_each_entry(m, &huge_boot_pages, list) { struct page *page = virt_to_page(m); struct folio *folio = (void *)page; + if (empty) { + printk(" Traversing an empty list as if not empty!!!\n"); + BUG(); + } + h = m->hstate; /* * It is possible to have multiple huge page sizes (hstates) - As you have experienced, this will BUG if built with LLVM 17.0.2 and CONFIG_INIT_STACK_NONE - It will NOT BUG if built with LLVM 13.0.1 but will BUG if built with LLVM llvm-14.0.6-x86_64 and later. As mentioned in the previous email, the generated code for loop entry looks wrong to my untrained eyes. Can you or someone on the llvm team take a look? -- Mike Kravetz