Re: [PATCH] core_pattern: add CPU specifier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello.

On středa 7. září 2022 0:22:42 CEST Eric W. Biederman wrote:
> Oleksandr Natalenko <oleksandr@xxxxxxxxxx> writes:
> 
> > Statistically, in a large deployment regular segfaults may indicate a CPU issue.
> >
> > Currently, it is not possible to find out what CPU the segfault happened on.
> > There are at least two attempts to improve segfault logging with this regard,
> > but they do not help in case the logs rotate.
> >
> > Hence, lets make sure it is possible to permanently record a CPU
> > the task ran on using a new core_pattern specifier.
> 
> I am puzzled why make it part of the file name, and not part of the
> core file?  Say an elf note?

This might be a good idea too, and one approach doesn't exclude the other one.

> The big advantage is that you could always capture the cpu and
> will not need to take special care configuring your system to
> capture that information.

The advantage of having CPU recorded in the file name is that in case of multiple cores one can summarise them with a simple ls+grep without invoking a fully-featured debugger to find out whether the segfaults happened on the same CPU.

Thanks.

> Eric
> 
> > Suggested-by: Renaud Métrich <rmetrich@xxxxxxxxxx>
> > Signed-off-by: Oleksandr Natalenko <oleksandr@xxxxxxxxxx>
> > ---
> >  Documentation/admin-guide/sysctl/kernel.rst | 1 +
> >  fs/coredump.c                               | 5 +++++
> >  include/linux/coredump.h                    | 1 +
> >  3 files changed, 7 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 835c8844bba48..b566fff04946b 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -169,6 +169,7 @@ core_pattern
> >  	%f      	executable filename
> >  	%E		executable path
> >  	%c		maximum size of core file by resource limit RLIMIT_CORE
> > +	%C		CPU the task ran on
> >  	%<OTHER>	both are dropped
> >  	========	==========================================
> >  
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index a8661874ac5b6..166d1f84a9b17 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -325,6 +325,10 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> >  				err = cn_printf(cn, "%lu",
> >  					      rlimit(RLIMIT_CORE));
> >  				break;
> > +			/* CPU the task ran on */
> > +			case 'C':
> > +				err = cn_printf(cn, "%d", cprm->cpu);
> > +				break;
> >  			default:
> >  				break;
> >  			}
> > @@ -535,6 +539,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> >  		 */
> >  		.mm_flags = mm->flags,
> >  		.vma_meta = NULL,
> > +		.cpu = raw_smp_processor_id(),
> >  	};
> >  
> >  	audit_core_dumps(siginfo->si_signo);
> > diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> > index 08a1d3e7e46d0..191dcf5af6cb9 100644
> > --- a/include/linux/coredump.h
> > +++ b/include/linux/coredump.h
> > @@ -22,6 +22,7 @@ struct coredump_params {
> >  	struct file *file;
> >  	unsigned long limit;
> >  	unsigned long mm_flags;
> > +	int cpu;
> >  	loff_t written;
> >  	loff_t pos;
> >  	loff_t to_skip;

-- 
Oleksandr Natalenko (post-factum)
Principal Software Maintenance Engineer






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux