Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()

David Hildenbrand <david@xxxxxxxxxx> · Wed, 16 Oct 2024 17:54:56 +0200

On 16.10.24 17:47, David Hildenbrand wrote:

When I wrote that code I was rather convinced that the variant in this patch
is the right thing to do.

A short explanation about what a stand-alone kdump is.

* First, it's not really a _regular_ kdump activated with kexec-tools and
    executed by Linux itself but a regular stand-alone dump (SCSI) from the
    FW's perspective (one has to use HMC or dumpconf to execute it and not
    with kexec-tools like for the _regular_ kdump).

Ah, that makes sense.

* One has to reserve crashkernel memory region in the old crashed kernel
    even if it remains unused until the dump starts.
* zipl uses regular kdump kernel and initramfs to create stand-alone
    dumper images and to write them to a dump disk which is used for
    IPLIng the stand-alone dumper.
* The zipl bootloader takes care of transferring the old kernel memory
    saved in HSA by the FW to the crashkernel memory region reserved by the old
    crashed kernel before it enters the dumper. The HSA memory is released
    by the zipl bootloader _before_ the dumper image is entered,
    therefore, we cannot use HSA to read old kernel memory, and instead
    use memory from crashkernel region, just like the regular kdump.
* is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
    the dumper like a regular stand-alone dump (e.g. zfcpdump).
* Summarized, zipl bootloader prepares an environment which is expected by
    the regular kdump for a stand-alone kdump dumper before it is entered.

Thanks for the details!


In my opinion, the correct version of is_kdump_kernel() would be

bool is_kdump_kernel(void)
{
          return oldmem_data.start;
}

because Linux kernel doesn't differentiate between both the regular
and the stand-alone kdump where it matters while performing dumper
operations (e.g. reading saved old kernel memory from crashkernel memory region).


Right, but if we consider "/proc/vmcore is available", a better version
would IMHO be:

bool is_kdump_kernel(void)
{
            return dump_available();
}

Because that is mostly (not completely) how is_kdump_kernel() would have
worked right now *after* we had the elfcorehdr_alloc() during the
fs_init call.


Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
is to tell us whether Linux kernel runs in a kdump like environment and not
whether the current mode is identical to the proper and true kdump,
right ? And if stand-alone kdump swims like a duck, quacks like one, then it
is one, regardless how it was started, by kexecing or IPLing
from a disk.

Same thinking here.


The stand-alone kdump has a very special use case which most users will
never encounter. And usually, one just takes zfcpdump instead which is
more robust and much smaller considering how big kdump initrd can get.
stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.

Makes sense, so it boils down to either

bool is_kdump_kernel(void)
{
           return oldmem_data.start;
}

Which means is_kdump_kernel() can be "false" even though /proc/vmcore is
available or

bool is_kdump_kernel(void)
{
           return dump_available();
}

Which means is_kdump_kernel() can never be "false" if /proc/vmcore is
available. There is the chance of is_kdump_kernel() being "true" if
"elfcorehdr_alloc()" fails with -ENODEV.


You're call :) Thanks!


What I think we should do is the following (improved comment + patch
description), but I'll do whatever you think is better:


From e86194b5195c743eff33f563796b9c725fecc65f Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@xxxxxxxxxx>
Date: Wed, 4 Sep 2024 14:57:10 +0200
Subject: [PATCH] s390/kdump: provide custom is_kdump_kernel()

s390 currently always results in is_kdump_kernel() == false until
vmcore_init()->elfcorehdr_alloc() ran, because it sets
"elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to deactivate
any elfcorehdr= kernel parameter.

Let's follow the powerpc example and implement our own logic. Let's use
"dump_available()", because this is mostly (with one exception when
elfcorehdr_alloc() fails with -ENODEV) when we would create /proc/vmcore
and when is_kdump_kernel() would have returned "true" after
vmcore_init().

This is required for virtio-mem to reliably identify a kdump
environment before vmcore_init() was called to not try hotplugging memory.

Update the documentation above dump_available().

Tested-by: Mario Casquero <mcasquer@xxxxxxxxxx>
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
 arch/s390/include/asm/kexec.h |  4 ++++
 arch/s390/kernel/crash_dump.c |  6 ++++++
 arch/s390/kernel/smp.c        | 16 ++++++++--------
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 1bd08eb56d5f..bd20543515f5 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void);
 
 void arch_kexec_unprotect_crashkres(void);
 #define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres
+
+bool is_kdump_kernel(void);
+#define is_kdump_kernel is_kdump_kernel
 #endif
 
 #ifdef CONFIG_KEXEC_FILE
@@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
 int arch_kimage_file_post_load_cleanup(struct kimage *image);
 #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
 #endif
+
 #endif /*_S390_KEXEC_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 51313ed7e617..43bbaf534dd2 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
 						       prot);
 }
 
+bool is_kdump_kernel(void)
+{
+	return dump_available();
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
 static const char *nt_name(Elf64_Word type)
 {
 	const char *name = "LINUX";
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4df56fdb2488..bd41e35a27a0 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -574,7 +574,7 @@ int smp_store_status(int cpu)
 
 /*
  * Collect CPU state of the previous, crashed system.
- * There are four cases:
+ * There are three cases:
  * 1) standard zfcp/nvme dump
  *    condition: OLDMEM_BASE == NULL && is_ipl_type_dump() == true
  *    The state for all CPUs except the boot CPU needs to be collected
@@ -587,16 +587,16 @@ int smp_store_status(int cpu)
  *    with sigp stop-and-store-status. The firmware or the boot-loader
  *    stored the registers of the boot CPU in the absolute lowcore in the
  *    memory of the old system.
- * 3) kdump and the old kernel did not store the CPU state,
- *    or stand-alone kdump for DASD
- *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
+ * 3) kdump or stand-alone kdump for DASD
+ *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
  *    The state for all CPUs except the boot CPU needs to be collected
  *    with sigp stop-and-store-status. The kexec code or the boot-loader
  *    stored the registers of the boot CPU in the memory of the old system.
- * 4) kdump and the old kernel stored the CPU state
- *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
- *    This case does not exist for s390 anymore, setup_arch explicitly
- *    deactivates the elfcorehdr= kernel parameter
+ *
+ * Note that the old kdump mode where the old kernel stored the CPU state
+ * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
+ * kernel parameter. The is_kdump_kernel() implementation on s390 is independent
+ * of the elfcorehdr= parameter, and is purely based on dump_available().
  */
 static bool dump_available(void)
 {
--
2.46.1


--
Cheers,

David / dhildenb