Re: [REGRESSION] 6.6.10+ and 6.7+ kernels lock up early in init.

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 23 Jan 2024 08:57:50 -0500

On Tue, 2024-01-23 at 08:19 -0500, Jeff Layton wrote:
> On Tue, 2024-01-23 at 12:46 +0100, Sedat Dilek wrote:
> > On Tue, Jan 23, 2024 at 12:16 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > 
> > > On Tue, 2024-01-23 at 07:39 +0100, Linux regression tracking (Thorsten
> > > Leemhuis) wrote:
> > > > [a quick follow up with an important correction from the reporter for
> > > > those I added to the list of recipients]
> > > > 
> > > > On 23.01.24 06:37, Linux regression tracking (Thorsten Leemhuis) wrote:
> > > > > On 23.01.24 05:40, Paul Thompson wrote:
> > > > > > 
> > > > > >   With my longstanding configuration, kernels upto 6.6.9 work fine.
> > > > > > Kernels 6.6.1[0123] and 6.7.[01] all lock up in early (open-rc) init,
> > > > > > before even the virtual filesystems are mounted.
> > > > > > 
> > > > > >   The last thing visible on the console is the nfsclient service
> > > > > > being started and:
> > > > > > 
> > > > > > Call to flock failed: Funtion not implemented. (twice)
> > > > > > 
> > > > > >   Then the machine is unresponsive, numlock doesnt toggle the keyboard led,
> > > > > > and the alt-sysrq chords appear to do nothing.
> > > > > > 
> > > > > >   The problem is solved by changing my 6.6.9 config option:
> > > > > > 
> > > > > > # CONFIG_FILE_LOCKING is not set
> > > > > > to
> > > > > > CONFIG_FILE_LOCKING=y
> > > > > > 
> > > > > > (This option is under File Systems > Enable POSIX file locking API)
> > > > 
> > > > The reporter replied out-of-thread:
> > > > https://lore.kernel.org/all/Za9TRtSjubbX0bVu@xxxxxxxxxxxxxxx/
> > > > 
> > > > """
> > > >       Now I feel stupid or like Im losing it, but I went back and grepped for
> > > > the CONFIG_FILE_LOCKING in my old Configs, and it was turned on in all
> > > > but 6.6.9. So, somehow I turned that off *after I built 6.6.9? Argh. I
> > > > just built 6.6.4 with it unset and that locked up too.
> > > >       Sorry if this is just noise, though one would have hoped the failure
> > > > was less severe...
> > > > """
> > > > 
> > > 
> > > Ok, so not necessarily a regression? It might be helpful to know the
> > > earliest kernel you can boot with CONFIG_FILE_LOCKING turned off.
> > > 
> > > > > 
> > > I'll give a try reproducing this later though.
> > 
> > Quote from Paul:
> > "
> > Now I feel stupid or like Im losing it, but I went back and grepped
> > for the CONFIG_FILE_LOCKING in my old Configs, and it was turned on in all
> > but 6.6.9. So, somehow I turned that off *after I built 6.6.9? Argh. I just
> > built 6.6.4 with it unset and that locked up too.
> > Sorry if this is just noise, though one would have hoped the failure
> > was less severe...
> > "
> > 
> > -Sedat-
> > 
> > https://lore.kernel.org/all/Za9TRtSjubbX0bVu@xxxxxxxxxxxxxxx/#t
> > 
> > 
> 
> Ok, I can reproduce this in KVM, which should make this a bit simpler:
> 
> I tried turning off CONFIG_FILE_LOCKING on mainline kernels and it also
> hung for me at boot here (I think it was trying to enable the nvme disks
> attached to this host):
> 
> [  OK  ] Reached target sysinit.target - System Initialization.
> [  OK  ] Finished dracut-pre-mount.service - dracut pre-mount hook.
> [  OK  ] Started plymouth-start.service - Show Plymouth Boot Screen.
> [  OK  ] Started systemd-ask-password-plymo…quests to Plymouth Directory Watch.
> [  OK  ] Reached target paths.target - Path Units.
> [  OK  ] Reached target basic.target - Basic System.
> [    4.647183] cryptd: max_cpu_qlen set to 1000
> [    4.650280] AVX2 version of gcm_enc/dec engaged.
> [    4.651252] AES CTR mode by8 optimization enabled
>          Starting systemd-vconsole-setup.service - Virtual Console Setup...
> [FAILED] Failed to start systemd-vconsole-s…up.service - Virtual Console Setup.
> See 'systemctl status systemd-vconsole-setup.service' for details.
> [    5.777176] virtio_blk virtio3: 8/0/0 default/read/poll queues
> [    5.784633] virtio_blk virtio3: [vda] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB)
> [    5.791351]  vda: vda1 vda2 vda3
> [    5.792672] virtio_blk virtio6: 8/0/0 default/read/poll queues
> [    5.801796] virtio_blk virtio6: [vdb] 209715200 512-byte logical blocks (107 GB/100 GiB)
> [    5.807839] virtio_blk virtio7: 8/0/0 default/read/poll queues
> [    5.813098] virtio_blk virtio7: [vdc] 209715200 512-byte logical blocks (107 GB/100 GiB)
> [    5.818500] virtio_blk virtio8: 8/0/0 default/read/poll queues
> [    5.823969] virtio_blk virtio8: [vdd] 209715200 512-byte logical blocks (107 GB/100 GiB)
> [    5.829217] virtio_blk virtio9: 8/0/0 default/read/poll queues
> [    5.834636] virtio_blk virtio9: [vde] 209715200 512-byte logical blocks (107 GB/100 GiB)
> [    **] Job dev-disk-by\x2duuid-5a8a135f\x2…art running (13min 46s / no limit)
> 
> 
> The last part will just keep spinning forever.
> 
> I've gone back as far as v6.0, and I see the same behavior. I then tried
> changing the disks in the VM to be attached by virtio instead of NVMe,
> and that also didn't help.
> 
> That said, I'm using a fedora 39 cloud image here. I'm not sure it's
> reasonable to expect that to boot properly with file locking disabled.
>  
> Paul, what distro are you running? When you say that it's hung, are you
> seeing similar behavior?

FWIW, I grabbed a dump of the VM's memory and took a quick look with
crash. All of the tasks are either idle, or waiting in epoll. Perhaps
there is some subtle dependency between epoll and CONFIG_FILE_LOCKING?

PID: 190      TASK: ffff8fa846eb3080  CPU: 7    COMMAND: "systemd-journal"
 #0 [ffffb5560063bd18] __schedule at ffffffffa10e8d39
 #1 [ffffb5560063bd88] schedule at ffffffffa10e9491
 #2 [ffffb5560063bda0] schedule_hrtimeout_range_clock at ffffffffa10eff99
 #3 [ffffb5560063be10] do_epoll_wait at ffffffffa0a08106
 #4 [ffffb5560063bee8] __x64_sys_epoll_wait at ffffffffa0a0872d
 #5 [ffffb5560063bf38] do_syscall_64 at ffffffffa10d3af4
 #6 [ffffb5560063bf50] entry_SYSCALL_64_after_hwframe at ffffffffa12000e6
    RIP: 00007f975753cac7  RSP: 00007ffe07ab17b8  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 000000000000001e  RCX: 00007f975753cac7
    RDX: 000000000000001e  RSI: 000055d723ad8ca0  RDI: 0000000000000007
    RBP: 00007ffe07ab18d0   R8: 000055d723ad79ac   R9: 0000000000000007
    R10: 00000000ffffffff  R11: 0000000000000202  R12: 000055d723ad8ca0
    R13: 0000000000000010  R14: 000055d723ad33b0  R15: ffffffffffffffff
    ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b

Whether this is a regression or not, a lot of userland software relies
on file locking these days. Maybe this is a good time to consider
getting rid of CONFIG_FILE_LOCKING and just hardcoding it on.

By disabling it, it looks like you save 4 bytes in struct inode. I'm not
sure that's worth the hassle of having to deal with the extra test
matrix dimension. In a really stripped down configuration where you
don't need file locking, are you likely to have a lot of inodes in core
anyway?

I guess you also save a little kernel text too, but I still have to
wonder if it's worth it.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>