On Tue, 2024-01-23 at 08:19 -0500, Jeff Layton wrote: > On Tue, 2024-01-23 at 12:46 +0100, Sedat Dilek wrote: > > On Tue, Jan 23, 2024 at 12:16 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > On Tue, 2024-01-23 at 07:39 +0100, Linux regression tracking (Thorsten > > > Leemhuis) wrote: > > > > [a quick follow up with an important correction from the reporter for > > > > those I added to the list of recipients] > > > > > > > > On 23.01.24 06:37, Linux regression tracking (Thorsten Leemhuis) wrote: > > > > > On 23.01.24 05:40, Paul Thompson wrote: > > > > > > > > > > > > With my longstanding configuration, kernels upto 6.6.9 work fine. > > > > > > Kernels 6.6.1[0123] and 6.7.[01] all lock up in early (open-rc) init, > > > > > > before even the virtual filesystems are mounted. > > > > > > > > > > > > The last thing visible on the console is the nfsclient service > > > > > > being started and: > > > > > > > > > > > > Call to flock failed: Funtion not implemented. (twice) > > > > > > > > > > > > Then the machine is unresponsive, numlock doesnt toggle the keyboard led, > > > > > > and the alt-sysrq chords appear to do nothing. > > > > > > > > > > > > The problem is solved by changing my 6.6.9 config option: > > > > > > > > > > > > # CONFIG_FILE_LOCKING is not set > > > > > > to > > > > > > CONFIG_FILE_LOCKING=y > > > > > > > > > > > > (This option is under File Systems > Enable POSIX file locking API) > > > > > > > > The reporter replied out-of-thread: > > > > https://lore.kernel.org/all/Za9TRtSjubbX0bVu@xxxxxxxxxxxxxxx/ > > > > > > > > """ > > > > Now I feel stupid or like Im losing it, but I went back and grepped for > > > > the CONFIG_FILE_LOCKING in my old Configs, and it was turned on in all > > > > but 6.6.9. So, somehow I turned that off *after I built 6.6.9? Argh. I > > > > just built 6.6.4 with it unset and that locked up too. > > > > Sorry if this is just noise, though one would have hoped the failure > > > > was less severe... > > > > """ > > > > > > > > > > Ok, so not necessarily a regression? It might be helpful to know the > > > earliest kernel you can boot with CONFIG_FILE_LOCKING turned off. > > > > > > > > > > > I'll give a try reproducing this later though. > > > > Quote from Paul: > > " > > Now I feel stupid or like Im losing it, but I went back and grepped > > for the CONFIG_FILE_LOCKING in my old Configs, and it was turned on in all > > but 6.6.9. So, somehow I turned that off *after I built 6.6.9? Argh. I just > > built 6.6.4 with it unset and that locked up too. > > Sorry if this is just noise, though one would have hoped the failure > > was less severe... > > " > > > > -Sedat- > > > > https://lore.kernel.org/all/Za9TRtSjubbX0bVu@xxxxxxxxxxxxxxx/#t > > > > > > Ok, I can reproduce this in KVM, which should make this a bit simpler: > > I tried turning off CONFIG_FILE_LOCKING on mainline kernels and it also > hung for me at boot here (I think it was trying to enable the nvme disks > attached to this host): > > [ OK ] Reached target sysinit.target - System Initialization. > [ OK ] Finished dracut-pre-mount.service - dracut pre-mount hook. > [ OK ] Started plymouth-start.service - Show Plymouth Boot Screen. > [ OK ] Started systemd-ask-password-plymo…quests to Plymouth Directory Watch. > [ OK ] Reached target paths.target - Path Units. > [ OK ] Reached target basic.target - Basic System. > [ 4.647183] cryptd: max_cpu_qlen set to 1000 > [ 4.650280] AVX2 version of gcm_enc/dec engaged. > [ 4.651252] AES CTR mode by8 optimization enabled > Starting systemd-vconsole-setup.service - Virtual Console Setup... > [FAILED] Failed to start systemd-vconsole-s…up.service - Virtual Console Setup. > See 'systemctl status systemd-vconsole-setup.service' for details. > [ 5.777176] virtio_blk virtio3: 8/0/0 default/read/poll queues > [ 5.784633] virtio_blk virtio3: [vda] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB) > [ 5.791351] vda: vda1 vda2 vda3 > [ 5.792672] virtio_blk virtio6: 8/0/0 default/read/poll queues > [ 5.801796] virtio_blk virtio6: [vdb] 209715200 512-byte logical blocks (107 GB/100 GiB) > [ 5.807839] virtio_blk virtio7: 8/0/0 default/read/poll queues > [ 5.813098] virtio_blk virtio7: [vdc] 209715200 512-byte logical blocks (107 GB/100 GiB) > [ 5.818500] virtio_blk virtio8: 8/0/0 default/read/poll queues > [ 5.823969] virtio_blk virtio8: [vdd] 209715200 512-byte logical blocks (107 GB/100 GiB) > [ 5.829217] virtio_blk virtio9: 8/0/0 default/read/poll queues > [ 5.834636] virtio_blk virtio9: [vde] 209715200 512-byte logical blocks (107 GB/100 GiB) > [ **] Job dev-disk-by\x2duuid-5a8a135f\x2…art running (13min 46s / no limit) > > > The last part will just keep spinning forever. > > I've gone back as far as v6.0, and I see the same behavior. I then tried > changing the disks in the VM to be attached by virtio instead of NVMe, > and that also didn't help. > > That said, I'm using a fedora 39 cloud image here. I'm not sure it's > reasonable to expect that to boot properly with file locking disabled. > > Paul, what distro are you running? When you say that it's hung, are you > seeing similar behavior? FWIW, I grabbed a dump of the VM's memory and took a quick look with crash. All of the tasks are either idle, or waiting in epoll. Perhaps there is some subtle dependency between epoll and CONFIG_FILE_LOCKING? PID: 190 TASK: ffff8fa846eb3080 CPU: 7 COMMAND: "systemd-journal" #0 [ffffb5560063bd18] __schedule at ffffffffa10e8d39 #1 [ffffb5560063bd88] schedule at ffffffffa10e9491 #2 [ffffb5560063bda0] schedule_hrtimeout_range_clock at ffffffffa10eff99 #3 [ffffb5560063be10] do_epoll_wait at ffffffffa0a08106 #4 [ffffb5560063bee8] __x64_sys_epoll_wait at ffffffffa0a0872d #5 [ffffb5560063bf38] do_syscall_64 at ffffffffa10d3af4 #6 [ffffb5560063bf50] entry_SYSCALL_64_after_hwframe at ffffffffa12000e6 RIP: 00007f975753cac7 RSP: 00007ffe07ab17b8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 000000000000001e RCX: 00007f975753cac7 RDX: 000000000000001e RSI: 000055d723ad8ca0 RDI: 0000000000000007 RBP: 00007ffe07ab18d0 R8: 000055d723ad79ac R9: 0000000000000007 R10: 00000000ffffffff R11: 0000000000000202 R12: 000055d723ad8ca0 R13: 0000000000000010 R14: 000055d723ad33b0 R15: ffffffffffffffff ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b Whether this is a regression or not, a lot of userland software relies on file locking these days. Maybe this is a good time to consider getting rid of CONFIG_FILE_LOCKING and just hardcoding it on. By disabling it, it looks like you save 4 bytes in struct inode. I'm not sure that's worth the hassle of having to deal with the extra test matrix dimension. In a really stripped down configuration where you don't need file locking, are you likely to have a lot of inodes in core anyway? I guess you also save a little kernel text too, but I still have to wonder if it's worth it. -- Jeff Layton <jlayton@xxxxxxxxxx>