On 20/10/23 17:02, Arnd Bergmann wrote:
On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann <arnd@xxxxxxxx> wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
it as compat mode boot testing. Recently it started to failed to get login
prompt.
We have not seen any kernel crash logs.
Anders, bisection is pointing to first bad commit,
546694b8f658 autofs: add autofs_parse_fd()
Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx>
Reported-by: Anders Roxell <anders.roxell@xxxxxxxxxx>
I tried to find something in that commit that would be different
in compat mode, but don't see anything at all -- this appears
to be just a simple refactoring of the code, unlike the commits
that immediately follow it and that do change the mount
interface.
Unfortunately this makes it impossible to just revert the commit
on top of linux-next. Can you double-check your bisection by
testing 546694b8f658 and the commit before it again?
I will try your suggested ways.
Is this information helpful ?
Linux-next the regression started happening from next-20230925.
GOOD: next-20230925
BAD: next-20230926
$ git log --oneline next-20230925..next-20230926 -- fs/autofs/
dede367149c4 autofs: fix protocol sub version setting
e6ec453bd0f0 autofs: convert autofs to use the new mount api
1f50012d9c63 autofs: validate protocol version
9b2731666d1d autofs: refactor parse_options()
7efd93ea790e autofs: reformat 0pt enum declaration
a7467430b4de autofs: refactor super block info init
546694b8f658 autofs: add autofs_parse_fd()
bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
Right, and it looks like the bottom five patches of this
should be fairly harmless as they only try to move code
around in preparation of the later changes, and even the
other ones should not cause any difference between a 32-bit
or a 64-bit /sbin/mount binary.
If the native (full 64-bit or full 32-bit) test run still
works with the same version, there may be some other difference
here.
What are the exact mount options you pass to autofs in your fstab?
mount output shows like this,
systemd-1 on /proc/sys/fs/binfmt_misc type autofs
(rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
This is only the binfmt-misc mount, which should not
prevent your rootfs from getting mounted, but it's possible
that failure to mount this prevents you from running
32-bit binaries.
I see this comes from the "proc-sys-fs-binfmt_misc.automount"
service in systemd. I see this is defined in
https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.automount
but I don't know exactly what its purpose is here. On a
64-bit system, you normally use compat_binfmt_elf.ko to run
32-bit binaries, and this does not require any specific mount
points. Alternatively, you could use binfmt_misc.ko with
the procfs mount to configure running arbitrary binary
formats such as arm32 on x86_64 with qemu-user emulation.
I double-checked your rootfs image from
https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz
to ensure that this indeed contains i386 executables rather than
arm32 ones, and that is all fine.
I also see in your log file at
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/testrun/20125035/suite/boot/test/gcc-13-lkftconfig-compat/log
that it is running the i386 binaries from the rootfs, but
it does get stuck soon after trying to set up the binfmt-misc
mount at the end of the log:
[[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems.
Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats...
Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories...
Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files...
[ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP)
[ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1
[ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left
quit
Were there any console log messages at the time the problem occurred?
I'm a bit out of ideas at that point, my best guess now is
that your bisection points to something in autofs that makes
it hang while setting up autofs, but that neither autofs
nor binfmt-misc are actually being used otherwise.
Maybe you can try to modify your rootfs to disable or remove
the systemd-binfmt.service, to confirm that autofs is not
actually needed here but does cause the crash?
Arnd