Re: [PATCH] x86_64: Fix check for __per_cpu_offset initialisation

lijiang <lijiang@xxxxxxxxxx> · Thu, 12 Aug 2021 12:53:31 +0800

Date: Wed, 11 Aug 2021 14:24:30 +0200
From: Philipp Rudo <prudo@xxxxxxxxxx>
To: lijiang <lijiang@xxxxxxxxxx>
Cc: "Discussion list for crash utility usage,   maintenance and
        development" <crash-utility@xxxxxxxxxx>
Subject: Re:  [PATCH] x86_64: Fix check for
        __per_cpu_offset initialisation
Message-ID: <20210811142430.5e3e1a86@rhtmp>
Content-Type: text/plain; charset=US-ASCII

Hi Lianbo,

On Wed, 11 Aug 2021 17:05:26 +0800
lijiang <lijiang@xxxxxxxxxx> wrote:

> >
> > Date: Thu,  5 Aug 2021 15:19:37 +0200
> > From: Philipp Rudo <prudo@xxxxxxxxxx>
> > To: crash-utility@xxxxxxxxxx
> > Subject:  [PATCH] x86_64: Fix check for
> >         __per_cpu_offset        initialisation
> > Message-ID: <20210805131937.5051-1-prudo@xxxxxxxxxx>
> >
> > Since at least kernel v2.6.30 the __per_cpu_offset gets initialized to
> > __per_cpu_load. So first check if the __per_cpu_offset was set to a
> > proper value before reading any per cpu variable to prevent potential
> > bugs.
> >
> >  
> Hi, Philipp
> 
> Thank you for the patch. Can you help to describe  more details about the
> potential risks? and what conditions might trigger the potential bugs?

the bug is always triggered during initialization of the per-cpu data
on x86_64. At least for kernels not using struct x8664_pda, which
AFAIK was also removed with kernel v2.6.30.

The risk for crash is low. Right after the superfluous read there is a
check if the read cpunumber matches the expected one. 

                         if (cpunumber != cpus)
                                 break;

So the worst case scenario I see is that crash initializes one
additional cpu with non-sense data. But given that the bug exists for
~12 years and nobody reported such an bug I assume that the check worked
well so far.

Thank you for the explanation in detail, Philipp.

> Did you mean that it's related to the crash live analysis issue(1978032)? I
> tried to reproduce it, but so far I haven't reproduced it with the upstream
> kernel.

Yes, this bug is related to bz1978032. For whatever reason the
superfluous read triggered the panic.

I could reproduce the bug upstream with CONFIG_IO_URING _disabled_.
Unfortunately there is a RHEL-only patch [1] that tampers with the
Kconfig for IO_URING. So when you copy a kernel-ark config to the
upstream repo and run 'make oldconfig' the IO_URING will silently be
_enabled_.

You are right.

BTW, I tried to reproduce the panic yesterday on kernel-5.14.0-0.rc4
but failed. Not sure if the bug was fixed in the meantime or I was
simply "lucky"...

This issue may have been fixed in the kernel-5.14.0-0.rc4, however, this patch is still meaningful, and can prevent potential risks.

Acked-by: Lianbo Jiang <lijiang@xxxxxxxxxx>
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility