Hello. On středa 7. září 2022 0:22:42 CEST Eric W. Biederman wrote: > Oleksandr Natalenko <oleksandr@xxxxxxxxxx> writes: > > > Statistically, in a large deployment regular segfaults may indicate a CPU issue. > > > > Currently, it is not possible to find out what CPU the segfault happened on. > > There are at least two attempts to improve segfault logging with this regard, > > but they do not help in case the logs rotate. > > > > Hence, lets make sure it is possible to permanently record a CPU > > the task ran on using a new core_pattern specifier. > > I am puzzled why make it part of the file name, and not part of the > core file? Say an elf note? This might be a good idea too, and one approach doesn't exclude the other one. > The big advantage is that you could always capture the cpu and > will not need to take special care configuring your system to > capture that information. The advantage of having CPU recorded in the file name is that in case of multiple cores one can summarise them with a simple ls+grep without invoking a fully-featured debugger to find out whether the segfaults happened on the same CPU. Thanks. > Eric > > > Suggested-by: Renaud Métrich <rmetrich@xxxxxxxxxx> > > Signed-off-by: Oleksandr Natalenko <oleksandr@xxxxxxxxxx> > > --- > > Documentation/admin-guide/sysctl/kernel.rst | 1 + > > fs/coredump.c | 5 +++++ > > include/linux/coredump.h | 1 + > > 3 files changed, 7 insertions(+) > > > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > > index 835c8844bba48..b566fff04946b 100644 > > --- a/Documentation/admin-guide/sysctl/kernel.rst > > +++ b/Documentation/admin-guide/sysctl/kernel.rst > > @@ -169,6 +169,7 @@ core_pattern > > %f executable filename > > %E executable path > > %c maximum size of core file by resource limit RLIMIT_CORE > > + %C CPU the task ran on > > %<OTHER> both are dropped > > ======== ========================================== > > > > diff --git a/fs/coredump.c b/fs/coredump.c > > index a8661874ac5b6..166d1f84a9b17 100644 > > --- a/fs/coredump.c > > +++ b/fs/coredump.c > > @@ -325,6 +325,10 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm, > > err = cn_printf(cn, "%lu", > > rlimit(RLIMIT_CORE)); > > break; > > + /* CPU the task ran on */ > > + case 'C': > > + err = cn_printf(cn, "%d", cprm->cpu); > > + break; > > default: > > break; > > } > > @@ -535,6 +539,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) > > */ > > .mm_flags = mm->flags, > > .vma_meta = NULL, > > + .cpu = raw_smp_processor_id(), > > }; > > > > audit_core_dumps(siginfo->si_signo); > > diff --git a/include/linux/coredump.h b/include/linux/coredump.h > > index 08a1d3e7e46d0..191dcf5af6cb9 100644 > > --- a/include/linux/coredump.h > > +++ b/include/linux/coredump.h > > @@ -22,6 +22,7 @@ struct coredump_params { > > struct file *file; > > unsigned long limit; > > unsigned long mm_flags; > > + int cpu; > > loff_t written; > > loff_t pos; > > loff_t to_skip; -- Oleksandr Natalenko (post-factum) Principal Software Maintenance Engineer