On Sat, Sep 2, 2023 at 3:48 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > That was very disappointing: I found it hard to explain, but was thinking > of sending you a similar patch, doing the same check on all your 32 CPUs - > maybe the stall being on CPU 0 in your photo was accidental. > > But now I think I have the shameful answer (which studying your dmesg, > and the 82328 jiffies at 86 seconds in your photo, did help me towards). > > That mm/pagewalk fix I put into 6.5 has a grievous oversight (and a > video of your failing 6.6 bootup would likely have shown a WARN_ON_ONCE > from the underflow in __rcu_read_unlock()). > > Please revert the debug patch I sent yesterday (or earlier today), please > try booting with this one on top of a349d72fd9ef; and if that's successful, > then please go back to your original Rawhide tree and apply this on top of > that, to confirm that boots to a working system too - thanks. > > With my apologies, > > [PATCH] mm/pagewalk: fix bootstopping regression from extra pte_unmap() > > [ Commit message yet to be written: it's actually something to go to > 6.5 stable, to correct i386 CONFIG_HIGHPTE there - though we know of > no case where it is actually hit. ] > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > --- > mm/pagewalk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > index 2022333805d3..9e7d0276c38a 100644 > --- a/mm/pagewalk.c > +++ b/mm/pagewalk.c > @@ -58,7 +58,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, > pte = pte_offset_map(pmd, addr); > if (pte) { > err = walk_pte_range_inner(pte, addr, end, walk); > - if (walk->mm != &init_mm) > + if (walk->mm != &init_mm && addr < TASK_SIZE) > pte_unmap(pte); > } > } else { > -- > 2.35.3 Great, this is the right patch. Both build a349d72fd9ef and latest in Rawhide (now it is 99d99825fc07) works fine after applying this patch. So thank you a lot. Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx> -- Best Regards, Mike Gavrilov.