Hello Eric,
On 09/02/23 23:01, Eric DeVolder wrote:
On 2/8/23 07:44, Thomas Gleixner wrote:
Eric!
On Tue, Feb 07 2023 at 11:23, Eric DeVolder wrote:
On 2/1/23 05:33, Thomas Gleixner wrote:
So my latest solution is introduce two new CPUHP states,
CPUHP_AP_ELFCOREHDR_ONLINE
for onlining and CPUHP_BP_ELFCOREHDR_OFFLINE for offlining. I'm open
to better names.
The CPUHP_AP_ELFCOREHDR_ONLINE needs to be placed after
CPUHP_BRINGUP_CPU. My
attempts at locating this state failed when inside the STARTING
section, so I located
this just inside the ONLINE sectoin. The crash hotplug handler is
registered on
this state as the callback for the .startup method.
The CPUHP_BP_ELFCOREHDR_OFFLINE needs to be placed before
CPUHP_TEARDOWN_CPU, and I
placed it at the end of the PREPARE section. This crash hotplug
handler is also
registered on this state as the callback for the .teardown method.
TBH, that's still overengineered. Something like this:
bool cpu_is_alive(unsigned int cpu)
{
struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
return data_race(st->state) <= CPUHP_AP_IDLE_DEAD;
}
and use this to query the actual state at crash time. That spares all
those callback heuristics.
I'm making my way though percpu crash_notes, elfcorehdr, vmcoreinfo,
makedumpfile and (the consumer of it all) the userspace crash utility,
in order to understand the impact of moving from for_each_present_cpu()
to for_each_online_cpu().
Is the packing actually worth the trouble? What's the actual win?
Thanks,
tglx
Thomas,
I've investigated the passing of crash notes through the vmcore. What
I've learned is that:
- linux/fs/proc/vmcore.c (which makedumpfile references to do its job)
does
not care what the contents of cpu PT_NOTES are, but it does coalesce
them together.
- makedumpfile will count the number of cpu PT_NOTES in order to
determine its
nr_cpus variable, which is reported in a header, but otherwise
unused (except
for sadump method).
- the crash utility, for the purposes of determining the cpus, does
not appear to
reference the elfcorehdr PT_NOTEs. Instead it locates the various
cpu_[possible|present|online]_mask and computes nr_cpus from that,
and also of
course which are online. In addition, when crash does reference the
cpu PT_NOTE,
to get its prstatus, it does so by using a percpu technique directly
in the vmcore
image memory, not via the ELF structure. Said differently, it
appears to me that
crash utility doesn't rely on the ELF PT_NOTEs for cpus; rather it
obtains them
via kernel cpumasks and the memory within the vmcore.
With this understanding, I did some testing. Perhaps the most telling
test was that I
changed the number of cpu PT_NOTEs emitted in the
crash_prepare_elf64_headers() to just 1,
hot plugged some cpus, then also took a few offline sparsely via
chcpu, then generated a
vmcore. The crash utility had no problem loading the vmcore, it
reported the proper number
of cpus and the number offline (despite only one cpu PT_NOTE), and
changing to a different
cpu via 'set -c 30' and the backtrace was completely valid.
My take away is that crash utility does not rely upon ELF cpu
PT_NOTEs, it obtains the
cpu information directly from kernel data structures. Perhaps at one
time crash relied
upon the ELF information, but no more. (Perhaps there are other crash
dump analyzers
that might rely on the ELF info?)
So, all this to say that I see no need to change
crash_prepare_elf64_headers(). There
is no compelling reason to move away from for_each_present_cpu(), or
modify the list for
online/offline.
Which then leaves the topic of the cpuhp state on which to register.
Perhaps reverting
back to the use of CPUHP_BP_PREPARE_DYN is the right answer. There
does not appear to
be a compelling need to accurately track whether the cpu went
online/offline for the
purposes of creating the elfcorehdr, as ultimately the crash utility
pulls that from
kernel data structures, not the elfcorehdr.
I think this is what Sourabh has known and has been advocating for an
optimization
path that allows not regenerating the elfcorehdr on cpu changes
(because all the percpu
structs are all laid out). I do think it best to leave that as an arch
choice.
Since things are clear on how the PT_NOTES are consumed in kdump kernel
[fs/proc/vmcore.c],
makedumpfile, and crash tool I need your opinion on this:
Do we really need to regenerate elfcorehdr for CPU hotplug events?
If yes, can you please list the elfcorehdr components that changes due
to CPU hotplug.
From what I understood, crash notes are prepared for possible CPUs as
system boots and
could be used to create a PT_NOTE section for each possible CPU while
generating the elfcorehdr
during the kdump kernel load.
Now once the elfcorehdr is loaded with PT_NOTEs for every possible CPU
there is no need to
regenerate it for CPU hotplug events. Or do we?
Thanks,
Sourabh Jain
_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec