Re: Bisected stability regression in 6.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/11/23 07:31, matoro wrote:
Hi Helge, I have bisected a regression in 6.6 which is causing
userspace segfaults at a significantly increased rate in kernel 6.6.
There seems to be a pathological case triggered by the ninja build
tool.  The test case I have been using is cmake with ninja backend to
attempt to build the nghttp2 package.  In 6.6, this segfaults, not at
the same location every time, but with enough reliability that I was
able to use it as a bisection regression case, including immediately
after a reboot.  In the kernel log, these show up as "trap #15: Data
TLB miss fault" messages.  Now these messages can and do show up in
6.5 causing segfaults, but never immediately after a reboot and
infrequently enough that the system is stable.  With kernel 6.6 I am
completely unable to build nghttp2 under any circumstances.

I have bisected this down to the following commit:

$ git bisect good
3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit
commit 3033cd4307681c60db6d08f398a64484b36e0b0f
Author: Helge Deller <deller@xxxxxx>
Date:   Sat Aug 19 00:53:28 2023 +0200

     parisc: Use generic mmap top-down layout and brk randomization

     parisc uses a top-down layout by default that exactly fits the generic
     functions, so get rid of arch specific code and use the generic version
     by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT.

     Note that on parisc the stack always grows up and a "unlimited stack"
     simply means that the value as defined in CONFIG_STACK_MAX_DEFAULT_SIZE_MB
     should be used. So RLIM_INFINITY is not an indicator to use the legacy
     memory layout.

     Signed-off-by: Helge Deller <deller@xxxxxx>

  arch/parisc/Kconfig             | 17 +++++++++++++
  arch/parisc/kernel/process.c    | 14 -----------
  arch/parisc/kernel/sys_parisc.c | 54 +----------------------------------------
  mm/util.c                       |  5 +++-
  4 files changed, 22 insertions(+), 68 deletions(-)

Thanks for your report!
I think it's quite unlikely that this patch introduces such a bad regression.
I'd suspect some other bad commmit, but I'll try to reproduce.

In any case, do you have CONFIG_BPF_JIT enabled? If so, could you try
to reproduce with CONFIG_BPF_JIT disabled?
The JIT is quite new in v6.6 and I did face some crashes and disabling
it helped me so far.

I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe
("parisc: Add nop instructions after TLB inserts") on top of 6.6, but
it does NOT fix the issue.

Ok.

Helge




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux