Re: Bisected stability regression in 6.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/12/23 02:22, Dr. David Alan Gilbert wrote:
* matoro (matoro_mailinglist_kernel@xxxxxxxxx) wrote:
On 2023-11-11 16:27, Sam James wrote:
Helge Deller <deller@xxxxxx> writes:

On 11/11/23 07:31, matoro wrote:
Hi Helge, I have bisected a regression in 6.6 which is causing
userspace segfaults at a significantly increased rate in kernel 6.6.
There seems to be a pathological case triggered by the ninja build
tool.  The test case I have been using is cmake with ninja backend to
attempt to build the nghttp2 package.  In 6.6, this segfaults, not at
the same location every time, but with enough reliability that I was
able to use it as a bisection regression case, including immediately
after a reboot.  In the kernel log, these show up as "trap #15: Data
TLB miss fault" messages.  Now these messages can and do show up in
6.5 causing segfaults, but never immediately after a reboot and
infrequently enough that the system is stable.  With kernel 6.6 I am
completely unable to build nghttp2 under any circumstances.

I have bisected this down to the following commit:

$ git bisect good
3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit
commit 3033cd4307681c60db6d08f398a64484b36e0b0f
Author: Helge Deller <deller@xxxxxx>
Date:   Sat Aug 19 00:53:28 2023 +0200

      parisc: Use generic mmap top-down layout and brk randomization

      parisc uses a top-down layout by default that exactly fits
the generic
      functions, so get rid of arch specific code and use the
generic version
      by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT.

      Note that on parisc the stack always grows up and a "unlimited stack"
      simply means that the value as defined in
CONFIG_STACK_MAX_DEFAULT_SIZE_MB
      should be used. So RLIM_INFINITY is not an indicator to use
the legacy
      memory layout.

      Signed-off-by: Helge Deller <deller@xxxxxx>

   arch/parisc/Kconfig             | 17 +++++++++++++
   arch/parisc/kernel/process.c    | 14 -----------
   arch/parisc/kernel/sys_parisc.c | 54
+----------------------------------------
   mm/util.c                       |  5 +++-
   4 files changed, 22 insertions(+), 68 deletions(-)

Thanks for your report!
I think it's quite unlikely that this patch introduces such a bad
regression.
I'd suspect some other bad commmit, but I'll try to reproduce.

matoro, does a revert apply cleanly? Does it help?

Yes, I just tested this and it cleanly reverts on linux-6.6.y and the revert
does fix the issue.

Helge:
   In that patch is:

diff --git a/mm/util.c b/mm/util.c
index dd12b9531ac4c..8810206444977 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -396,7 +396,10 @@ static int mmap_is_legacy(struct rlimit *rlim_stack)
         if (current->personality & ADDR_COMPAT_LAYOUT)
                 return 1;

-       if (rlim_stack->rlim_cur == RLIM_INFINITY)
+       /* On parisc the stack always grows up - so a unlimited stack should
+        * not be an indicator to use the legacy memory layout. */
+       if (rlim_stack->rlim_cur == RLIM_INFINITY &&
+               !IS_ENABLED(CONFIG_STACK_GROWSUP))
                 return 1;

         return sysctl_legacy_va_layout;

is that:
    '!IS_ENABLED(CONFIG_STACK_GROWSUP))'

  the right way around?

That feels inverted to me;  non-parisc don't have that config
set, so !IS_ENABLED... is true,  so they return 1 instead of checking
the flag?

Right. For non-parisc the behaviour didn't change with my patch, and this
is intended. If rlim_stack->rlim_cur == RLIM_INFINITY, non-parisc return 1 as before.

Note that matoro reported a regression specifically on the parisc platform.

This change:
-       if (rlim_stack->rlim_cur == RLIM_INFINITY)
+       if (rlim_stack->rlim_cur == RLIM_INFINITY &&
+               !IS_ENABLED(CONFIG_STACK_GROWSUP))
just changes the behaviour on parisc.
On parisc rlim_stack->rlim_cur == RLIM_INFINITY" is always true, unless the user
changed the stack limit manually. If unchanged, mmap_is_legacy() should return
sysctl_legacy_va_layout, otherwise 1.

So, I think that part of the patch is OK.

Helge




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux