On Thu, May 18, 2023, Bagas Sanjaya wrote: > On 5/18/23 20:57, Bagas Sanjaya wrote: > > Hi, > > > > I notice a regression report on Bugzilla [1]. Quoting from it: > > > >> I'm experiencing sporadic but persistent segmentation faults on the KVM > >> VMs I manage. These faults began appearing after upgrading from Linux > >> Kernel 4.x to 5.15.59. I further upgraded to 5.15.91 and transitioned the > >> userspace from Debian 10 (buster) to Debian 11 (bullseye), yet the issues > >> persist. Notably, the libc has also changed in the process as seen in the > >> following error logs: Was the host or guest kernel upgraded? If the guest kernel was upgraded, it's unlikely, though still possible, that this is a KVM bug. > >> post.sh[21952]: bad frame in rt_sigreturn frame:000072db65961bb8 > >> ip:6c25f82a9a5d sp:72db65962168 orax:ffffffffffffffff in > >> libc-2.28.so[6c25f8294000+147000] > >> > >> cron[7626]: bad frame in rt_sigreturn frame:000073ddebeb6ff8 > >> ip:72ad9f44d594 sp:73ddebeb75a8 orax:ffffffffffffffff in > >> libc-2.28.so[72ad9f3a9000+147000] > >> > >> cron[64687]: bad frame in rt_sigreturn frame:000073265764b038 > >> ip:67c7b5a0f14a sp:73265764b5f0 orax:ffffffffffffffff in > >> libc-2.31.so[67c7b596f000+159000] > >> > >> worker.py[54568]: bad frame in rt_sigreturn frame:000078eef6591cf8 > >> ip:6c9f9b2a604e sp:78eef6592298 orax:ffffffffffffffff in > >> libpthread-2.31.so[6c9f9b29a000+10000] > >> > >> > >> The segmentation faults occur 1-3 times daily across approximately 1000 > >> VMs running on hundreds of (supermicro, intel cpu) bare-metal servers. > >> Currently, there's no reliable way for me to reproduce the issue. I > >> initially considered this bug - > >> https://www.spinics.net/lists/linux-tip-commits/msg61293.html - as a > >> possible cause, but judging from the comments it likely isn't. > >> > >> The best approximation to a reproducer I have is a Python script that > >> initiates several child processes and continuously sends them a sigusr1 > >> signal. Still, it takes a few hours to trigger the issue even when running > >> this script on several hundred VMs. > >> > >> Switching to the 6.x kernel isn't immediately feasible as these are > >> production systems with specific requirements. The transition is planned > >> but will likely take several months. > >> > >> I'm looking for suggestions on how to more reliably reproduce this > >> problem. Then I could try different old and new kernels and maybe narrow > >> it down. > > > > See bugzilla for the full thread. > > > > Anyway, I'm adding it to regzbot: > > > > #regzbot introduced: v4.19..v5.15 https://bugzilla.kernel.org/show_bug.cgi?id=217457 > > #regzbot title: bad frame in rt_sigreturn (libc-related?) regression after 5.15 upgrade > > > > Oops, I forgot to add the reporter: > > #regzbot from: Theodor Milkov <tm@xxxxxx> > > Sorry for inconvenience. > > -- > An old man doll... just what I always wanted! - Clara >