On 12/11/20 at 04:16pm, Rahul Gopakumar wrote: > Hi Baoquan, > > We re-evaluated your last patch and it seems to be fixing the > initial performance bug reported. During our previous testing, > we did not apply the patch rightly hence it was reporting > some issues. > > Here is the dmesg log confirming no delay in the draft patch. > > Vanilla (5.10 rc3) > ------------------ > > [ 0.024011] On node 2 totalpages: 89391104 > [ 0.024012] Normal zone: 1445888 pages used for memmap > [ 0.024012] Normal zone: 89391104 pages, LIFO batch:63 > [ 2.054646] ACPI: PM-Timer IO Port: 0x448 --------------> 2 secs delay > > Patch > ------ > > [ 0.024166] On node 2 totalpages: 89391104 > [ 0.024167] Normal zone: 1445888 pages used for memmap > [ 0.024167] Normal zone: 89391104 pages, LIFO batch:63 > [ 0.026694] ACPI: PM-Timer IO Port: 0x448 --------------> No delay > > Attached dmesg logs. Let me know if anything is needed from our end. I posted formal patchset to fix this issue. The patch 1 is doing the fix, and almost the same as the draft v2 patch I attached in this thread. Please feel free to help test and add your Tested-by: tag in the patch thread if possible. > > > > From: Rahul Gopakumar <gopakumarr@xxxxxxxxxx> > Sent: 24 November 2020 8:33 PM > To: bhe@xxxxxxxxxx <bhe@xxxxxxxxxx> > Cc: linux-mm@xxxxxxxxx <linux-mm@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>; akpm@xxxxxxxxxxxxxxxxxxxx <akpm@xxxxxxxxxxxxxxxxxxxx>; natechancellor@xxxxxxxxx <natechancellor@xxxxxxxxx>; ndesaulniers@xxxxxxxxxx <ndesaulniers@xxxxxxxxxx>; clang-built-linux@xxxxxxxxxxxxxxxx <clang-built-linux@xxxxxxxxxxxxxxxx>; rostedt@xxxxxxxxxxx <rostedt@xxxxxxxxxxx>; Rajender M <manir@xxxxxxxxxx>; Yiu Cho Lau <lauyiuch@xxxxxxxxxx>; Peter Jonasson <pjonasson@xxxxxxxxxx>; Venkatesh Rajaram <rajaramv@xxxxxxxxxx> > Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel > > Hi Baoquan, > > We applied the new patch to 5.10 rc3 and tested it. We are still > observing the same page corruption issue which we saw with the > old patch. This is causing 3 secs delay in boot time. > > Attached dmesg log from the new patch and also from vanilla > 5.10 rc3 kernel. > > There are multiple lines like below in the dmesg log of the > new patch. > > "BUG: Bad page state in process swapper pfn:ab08001" > > ________________________________________ > From: bhe@xxxxxxxxxx <bhe@xxxxxxxxxx> > Sent: 22 November 2020 6:38 AM > To: Rahul Gopakumar > Cc: linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; natechancellor@xxxxxxxxx; ndesaulniers@xxxxxxxxxx; clang-built-linux@xxxxxxxxxxxxxxxx; rostedt@xxxxxxxxxxx; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram > Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel > > On 11/20/20 at 03:11am, Rahul Gopakumar wrote: > > Hi Baoquan, > > > > To which commit should we apply the draft patch. We tried applying > > the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3 > > (the one we used for applying the previous patch) but it fails. > > I tested on 5.10-rc3+. You can append below change to the old patch in > your testing kernel. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index fa6076e1a840..5e5b74e88d69 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -448,6 +448,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > if (end_pfn < pgdat_end_pfn(NODE_DATA(nid))) > return false; > > + if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX) > + return true; > /* > * We start only with one section of pages, more pages are added as > * needed until the rest of deferred pages are initialized.