Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2020-06-04 at 21:26 -0500, John Donnelly wrote:
> On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:
> > > Hi All,
> > > 
> > > On Wed, Jun 3, 2020 at 9:03 PM John Donnelly <john.p.donnelly@xxxxxxxxxx>
> > > wrote:
> > > > > On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@xxxxxxxxxx> wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > 
> > > > > On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
> > > > > > Hi Chen,
> > > > > > 
> > > > > > On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <
> > > > > > john.p.donnelly@xxxxxxxxxx
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
> > > > > > > > prabhakar.pkin@xxxxxxxxx> wrote:
> > > > > > > > 
> > > > > > > > On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
> > > > > > > > john.p.donnelly@xxxxxxxxxx> wrote:
> > > > > > > > > Hi .  See below !
> > > > > > > > > 
> > > > > > > > > > On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <
> > > > > > > > > > bhsharma@xxxxxxxxxx>
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > Hi John,
> > > > > > > > > > 
> > > > > > > > > > On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
> > > > > > > > > > John.P.donnelly@xxxxxxxxxx> wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> > > > > > > > > > > > Hi Chen,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
> > > > > > > > > > > > chenzhou10@xxxxxxxxxx> wrote:
> > > > > > > > > > > > > This patch series enable reserving crashkernel above
> > > > > > > > > > > > > 4G in
> > > > > > > > > > > > > arm64.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > There are following issues in arm64 kdump:
> > > > > > > > > > > > > 1. We use crashkernel=X to reserve crashkernel below
> > > > > > > > > > > > > 4G,
> > > > > > > > > > > > > which will fail
> > > > > > > > > > > > > when there is no enough low memory.
> > > > > > > > > > > > > 2. Currently, crashkernel=Y@X can be used to reserve
> > > > > > > > > > > > > crashkernel above 4G,
> > > > > > > > > > > > > in this case, if swiotlb or DMA buffers are required,
> > > > > > > > > > > > > crash dump kernel
> > > > > > > > > > > > > will boot failure because there is no low memory
> > > > > > > > > > > > > available
> > > > > > > > > > > > > for allocation.
> > > > > > > > > > > > > 
> > > > > > > > > > > > We are getting "warn_alloc" [1] warning during boot of
> > > > > > > > > > > > kdump
> > > > > > > > > > > > kernel
> > > > > > > > > > > > with bootargs as [2] of primary kernel.
> > > > > > > > > > > > This error observed on ThunderX2  ARM64 platform.
> > > > > > > > > > > > 
> > > > > > > > > > > > It is observed with latest upstream tag (v5.7-rc3) with
> > > > > > > > > > > > this
> > > > > > > > > > > > patch set
> > > > > > > > > > > > and
> > > > > > > > > > > > 
> > 
https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> > > > > > > > > > > > Also **without** this patch-set
> > > > > > > > > > > > "
> > > > > > > > > > > > 
> > 
https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
> > > > > > > > > > > > "
> > > > > > > > > > > > 
> > > > > > > > > > > > This issue comes whenever crashkernel memory is reserved
> > > > > > > > > > > > after 0xc000_0000.
> > > > > > > > > > > > More details discussed earlier in
> > > > > > > > > > > > 
> > 
https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
> >    without
> > > > > > > > > > > > any
> > > > > > > > > > > > solution
> > > > > > > > > > > > 
> > > > > > > > > > > > This patch-set is expected to solve similar kind of
> > > > > > > > > > > > issue.
> > > > > > > > > > > > i.e. low memory is only targeted for DMA, swiotlb; So
> > > > > > > > > > > > above
> > > > > > > > > > > > mentioned
> > > > > > > > > > > > observation should be considered/fixed. .
> > > > > > > > > > > > 
> > > > > > > > > > > > --pk
> > > > > > > > > > > > 
> > > > > > > > > > > > [1]
> > > > > > > > > > > > [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> > > > > > > > > > > > TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> > > > > > > > > > > > [   30.367696] NET: Registered protocol family 16
> > > > > > > > > > > > [   30.369973] swapper/0: page allocation failure:
> > > > > > > > > > > > order:6,
> > > > > > > > > > > > mode:0x1(GFP_DMA),
> > > > > > > > > > > > nodemask=(null),cpuset=/,mems_allowed=0
> > > > > > > > > > > > [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > > > > > > > > > > 5.7.0-rc3+ #121
> > > > > > > > > > > > [   30.369981] Hardware name: Cavium Inc. Saber/Saber,
> > > > > > > > > > > > BIOS
> > > > > > > > > > > > TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
> > > > > > > > > > > > [   30.369984] Call trace:
> > > > > > > > > > > > [   30.369989]  dump_backtrace+0x0/0x1f8
> > > > > > > > > > > > [   30.369991]  show_stack+0x20/0x30
> > > > > > > > > > > > [   30.369997]  dump_stack+0xc0/0x10c
> > > > > > > > > > > > [   30.370001]  warn_alloc+0x10c/0x178
> > > > > > > > > > > > [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb
> > > > > > > > > > > > 10/0
> > > > > > > > > > > > xb50
> > > > > > > > > > > > [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
> > > > > > > > > > > > [   30.370008]  alloc_page_interleave+0x24/0x98
> > > > > > > > > > > > [   30.370011]  alloc_pages_current+0xe4/0x108
> > > > > > > > > > > > [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
> > > > > > > > > > > > [   30.370020]  do_one_initcall+0x54/0x228
> > > > > > > > > > > > [   30.370027]  kernel_init_freeable+0x228/0x2cc
> > > > > > > > > > > > [   30.370031]  kernel_init+0x1c/0x110
> > > > > > > > > > > > [   30.370034]  ret_from_fork+0x10/0x18
> > > > > > > > > > > > [   30.370036] Mem-Info:
> > > > > > > > > > > > [   30.370064] active_anon:0 inactive_anon:0
> > > > > > > > > > > > isolated_anon:0
> > > > > > > > > > > > [   30.370064]  active_file:0 inactive_file:0
> > > > > > > > > > > > isolated_file:0
> > > > > > > > > > > > [   30.370064]  unevictable:0 dirty:0 writeback:0
> > > > > > > > > > > > unstable:0
> > > > > > > > > > > > [   30.370064]  slab_reclaimable:34
> > > > > > > > > > > > slab_unreclaimable:4438
> > > > > > > > > > > > [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
> > > > > > > > > > > > [   30.370064]  free:1537719 free_pcp:219 free_cma:0
> > > > > > > > > > > > [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > isolated(anon):0kB
> > > > > > > > > > > > isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
> > > > > > > > > > > > shmem:0kB
> > > > > > > > > > > > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
> > > > > > > > > > > > writeback_tmp:0kB
> > > > > > > > > > > > unstable:0kB all_unreclaimable? no
> > > > > > > > > > > > [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > isolated(anon):0kB
> > > > > > > > > > > > isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
> > > > > > > > > > > > shmem:0kB
> > > > > > > > > > > > shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
> > > > > > > > > > > > writeback_tmp:0kB
> > > > > > > > > > > > unstable:0kB all_unreclaimable? no
> > > > > > > > > > > > [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB
> > > > > > > > > > > > high:0kB
> > > > > > > > > > > > reserved_highatomic:0KB active_anon:0kB
> > > > > > > > > > > > inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > writepending:0kB
> > > > > > > > > > > > present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
> > > > > > > > > > > > pagetables:0kB
> > > > > > > > > > > > bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > > > > > > > > > [   30.370084] lowmem_reserve[]: 0 250 6063 6063
> > > > > > > > > > > > [   30.370090] Node 0 DMA32 free:256000kB min:408kB
> > > > > > > > > > > > low:664kB
> > > > > > > > > > > > high:920kB reserved_highatomic:0KB active_anon:0kB
> > > > > > > > > > > > inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > writepending:0kB
> > > > > > > > > > > > present:269700kB managed:256000kB mlocked:0kB
> > > > > > > > > > > > kernel_stack:0kB
> > > > > > > > > > > > pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
> > > > > > > > > > > > free_cma:0kB
> > > > > > > > > > > > [   30.370094] lowmem_reserve[]: 0 0 5813 5813
> > > > > > > > > > > > [   30.370100] Node 0 Normal free:5894876kB min:9552kB
> > > > > > > > > > > > low:15504kB
> > > > > > > > > > > > high:21456kB reserved_highatomic:0KB active_anon:0kB
> > > > > > > > > > > > inactive_anon:0kB
> > > > > > > > > > > > active_file:0kB inactive_file:0kB unevictable:0kB
> > > > > > > > > > > > writepending:0kB
> > > > > > > > > > > > present:8388608kB managed:5953112kB mlocked:0kB
> > > > > > > > > > > > kernel_stack:21672kB
> > > > > > > > > > > > pagetables:56kB bounce:0kB free_pcp:876kB
> > > > > > > > > > > > local_pcp:176kB
> > > > > > > > > > > > free_cma:0kB
> > > > > > > > > > > > [   30.370104] lowmem_reserve[]: 0 0 0 0
> > > > > > > > > > > > [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB
> > > > > > > > > > > > 0*64kB
> > > > > > > > > > > > 0*128kB
> > > > > > > > > > > > 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> > > > > > > > > > > > [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB
> > > > > > > > > > > > 0*64kB 0*128kB
> > > > > > > > > > > > 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) =
> > > > > > > > > > > > 256000kB
> > > > > > > > > > > > [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME)
> > > > > > > > > > > > 2*16kB
> > > > > > > > > > > > (UE) 3*32kB
> > > > > > > > > > > > (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME)
> > > > > > > > > > > > 3*1024kB (ME)
> > > > > > > > > > > > 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
> > > > > > > > > > > > [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
> > > > > > > > > > > > hugepages_surp=0 hugepages_size=1048576kB
> > > > > > > > > > > > [   30.370130] 0 total pagecache pages
> > > > > > > > > > > > [   30.370132] 0 pages in swap cache
> > > > > > > > > > > > [   30.370134] Swap cache stats: add 0, delete 0, find
> > > > > > > > > > > > 0/0
> > > > > > > > > > > > [   30.370135] Free swap  = 0kB
> > > > > > > > > > > > [   30.370136] Total swap = 0kB
> > > > > > > > > > > > [   30.370137] 2164609 pages RAM
> > > > > > > > > > > > [   30.370139] 0 pages HighMem/MovableOnly
> > > > > > > > > > > > [   30.370140] 612331 pages reserved
> > > > > > > > > > > > [   30.370141] 0 pages hwpoisoned
> > > > > > > > > > > > [   30.370143] DMA: failed to allocate 256 KiB pool for
> > > > > > > > > > > > atomic
> > > > > > > > > > > > coherent allocation
> > > > > > > > > > > During my testing I saw the same error and
> > > > > > > > > > > Chen's  solution
> > > > > > > > > > > corrected it .
> > > > > > > > > > Which combination you are using on your side? I am using
> > > > > > > > > > Prabhakar's
> > > > > > > > > > suggested environment and can reproduce the issue
> > > > > > > > > > with or without Chen's crashkernel support above 4G
> > > > > > > > > > patchset.
> > > > > > > > > > 
> > > > > > > > > > I am also using a ThunderX2 platform with latest
> > > > > > > > > > makedumpfile
> > > > > > > > > > code and
> > > > > > > > > > kexec-tools (with the suggested patch
> > > > > > > > > > <
> > > > > > > > > > 
> > 
https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$
> > > > > > > > > > > ).
> > > > > > > > > > Thanks,
> > > > > > > > > > Bhupesh
> > > > > > > > > I did this activity 5 months ago and I have moved on to other
> > > > > > > > > activities. My DMA failures were related to PCI devices that
> > > > > > > > > could
> > > > > > > > > not be enumerated because  low-DMA space was not  available
> > > > > > > > > when
> > > > > > > > > crashkernel was moved above 4G; I don’t recall the exact
> > > > > > > > > platform.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > For this failure ,
> > > > > > > > > 
> > > > > > > > > > > > DMA: failed to allocate 256 KiB pool for atomic
> > > > > > > > > > > > coherent allocation
> > > > > > > > > Is due to :
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 3618082c
> > > > > > > > > ("arm64 use both ZONE_DMA and ZONE_DMA32")
> > > > > > > > > 
> > > > > > > > > With the introduction of ZONE_DMA to support the Raspberry DMA
> > > > > > > > > region below 1G, the crashkernel is placed in the upper 4G
> > > > > > > > > ZONE_DMA_32 region. Since the crashkernel does not have access
> > > > > > > > > to the ZONE_DMA region, it prints out call trace during
> > > > > > > > > bootup.
> > > > > > > > > 
> > > > > > > > > It is due to having this CONFIG item  ON  :
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > CONFIG_ZONE_DMA=y
> > > > > > > > > 
> > > > > > > > > Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
> > > > > > > > > use the device tree to specify memory below 1G.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > Disabling ZONE_DMA is temporary solution.  We may need proper
> > > > > > > > solution
> > > > > > > Perhaps the Raspberry platform configuration dependencies need
> > > > > > > separated  from “server class” Arm  equipment ?  Or auto-
> > > > > > > configured on
> > > > > > > boot ?  Consult an expert ;-)
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > > I would like to see Chen’s feature added , perhaps as
> > > > > > > > > EXPERIMENTAL,  so we can get some configuration testing done
> > > > > > > > > on
> > > > > > > > > it.   It corrects having a DMA zone in low memory while crash-
> > > > > > > > > kernel is above 4GB.  This has been going on for a year now.
> > > > > > > > I will also like this patch to be added in Linux as early as
> > > > > > > > possible.
> > > > > > > > 
> > > > > > > > Issue mentioned by me happens with or without this patch.
> > > > > > > > 
> > > > > > > > This patch-set can consider fixing because it uses low memory
> > > > > > > > for
> > > > > > > > DMA
> > > > > > > > & swiotlb only.
> > > > > > > > We can consider restricting crashkernel within the required
> > > > > > > > range
> > > > > > > > like below
> > > > > > > > 
> > > > > > > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > > > > > > > index 7f9e5a6dc48c..bd67b90d35bd 100644
> > > > > > > > --- a/kernel/crash_core.c
> > > > > > > > +++ b/kernel/crash_core.c
> > > > > > > > @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
> > > > > > > >                        return 0;
> > > > > > > >        }
> > > > > > > > 
> > > > > > > > -       low_base = memblock_find_in_range(0, 1ULL << 32,
> > > > > > > > low_size,
> > > > > > > > CRASH_ALIGN);
> > > > > > > > +       low_base = memblock_find_in_range(0,0xc0000000,
> > > > > > > > low_size,
> > > > > > > > CRASH_ALIGN);
> > > > > > > >        if (!low_base) {
> > > > > > > >                pr_err("Cannot reserve %ldMB crashkernel low
> > > > > > > > memory,
> > > > > > > > please try smaller size.\n",
> > > > > > > >                       (unsigned long)(low_size >> 20));
> > > > > > > > 
> > > > > > > > 
> > > > > > >     I suspect  0xc0000000  would need to be a CONFIG item  and not
> > > > > > > hard-coded.
> > > > > > > 
> > > > > > if you consider this as valid change,  can you please incorporate as
> > > > > > part of your patch-set.
> > > > > After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the
> > > > > 0-
> > > > > 4G memory is splited
> > > > > to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem
> > > > > 0x0000000040000000-0x00000000ffffffff] on arm64.
> > > > > 
> > > > >  From the above discussion, on your platform, the low crashkernel fall
> > > > > in
> > > > > DMA32 region, but your environment needs to access DMA
> > > > > region, so there is the call trace.
> > > > > 
> > > > > I have a question, why do you choose 0xc0000000 here?
> > > > > 
> > > > > Besides, this is common code, we also need to consider about x86.
> > > > > 
> > > >   + nsaenzjulienne@xxxxxxx
> > Thanks for adding me to the conversation, and sorry for the headaches.
> > 
> > > >    Exactly .  This is why it needs to be a CONFIG option for  Raspberry
> > > > ..,  or device tree option.
> > > > 
> > > > 
> > > >    We could revert 1a8e1cef7 since it broke  Arm kdump too.
> > > Well, unfortunately the patch for commit 1a8e1cef7603 ("arm64: use
> > > both ZONE_DMA and ZONE_DMA32") was not Cc'ed to the kexec mailing
> > > list, thus we couldn't get many eyes on it for a thorough review from
> > > kexec/kdump p-o-v.
> > > 
> > > Also we historically never had distinction in common arch code on the
> > > basis of the intended end use-case: embedded, server or automotive, so
> > > I am not sure introducing a Raspberry specific CONFIG option would be
> > > a good idea.
> > +1
> > 
> >  From the distros perspective it's very important to keep a single kernel
> > image.
> > 
> > > So, rather than reverting the patch, we can look at addressing the
> > > same properly this time - especially from a kdump p-o-v.
> > > This issue has been reported by some Red Hat arm64 partners with
> > > upstream kernel also and as we have noticed in the past as well,
> > > hardcoding the placement of the crashkernel base address (unless the
> > > base address is specified by a crashkernel=X@Y like bootargs) is also
> > > not a portable suggestion.
> > > 
> > > I am working on a possible fix and will have more updates on the same
> > > in a day-or-two.
> > Please keep me in the loop, we've also had issues pointing to this reported
> > by
> > SUSE partners. I can do some testing both on the RPi4 and on big servers
> > that
> > need huge crashkernel sizes.
> > 
> > Regards,
> > Nicolas
> > 
> Hi Nicolas,
> 
> 
> You want want to review this topic with the various email threads . It 
> has been a long journey.

Will do, thanks!

Regards,
Nicolas

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux