[ Added Christoph, since the issue technically goes further back than when the warning appeared - it just used to be silent ] On Sun, Jul 4, 2021 at 10:29 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > This patch results in the following runtime warning on nommu systems. Ok, good, it actually found something. > [ 8.574191] [<21059cab>] (vfs_read) from [<2105d92b>] (read_code+0x15/0x2e) > [ 8.574329] [<2105d92b>] (read_code) from [<21085a8d>] (load_flat_file+0x341/0x4f0) > [ 8.574481] [<21085a8d>] (load_flat_file) from [<21085e03>] (load_flat_binary+0x47/0x2dc) > [ 8.574639] [<21085e03>] (load_flat_binary) from [<2105d581>] (bprm_execve+0x1fd/0x32c) Hmm. That actually loads things into user space, so the problem isn't that it shouldn't use vfs_read() or that iov_iter_init() would be doing somethign wrong - the problem appears purely to be that we're in an "uaccess_kernel()" region. And yes, we're still in the early init code: > [ 8.574797] [<2105d581>] (bprm_execve) from [<2105dbb3>] (kernel_execve+0xa3/0xac) > [ 8.574947] [<2105dbb3>] (kernel_execve) from [<211e7095>] (kernel_init+0x31/0xb0) > [ 8.575099] [<211e7095>] (kernel_init) from [<2100814d>] (ret_from_fork+0x11/0x24) which presumably runs with KERNEL_DS. Which is kind of bogus in the new world order. None of this *matters* for a nommu system, of course, which is why that code used to work, and why it's now warning. But for the same reason, it should still continue to work, despite the warning. Because iov_iter_init() will actually be doing the right thing and making it all about user pointers. Can you verify that it otherwise looks ok apart from the new warning? I *think* we should move to initializing the kernel state to "set_fs(USER_DS)", and that would be the right model these days. Of course, that could cause other things to pop up on architectures that haven't been converted away from CONFIG_SET_FS. The safer thing might be to move it earlier in kernel_execve(): it does end up doing that "set_fs(USER_DS)" eventually, but it's done fairly late in "begin_new_exec()" (and it's done as a force_uaccess_begin(), not set_fs(), but in a CONFIG_SET_FS configuration that ends up being what it does. > The same warning is also observed with m68k and mcf5208evb, > though the traceback isn't as nice. Hmm. Either the m68k trace printing is just bad, or maybe it just doesn't have CONFIG_KALLSYMS (or KALLSYMS_ALL) enabled. Anyway, does a hacky patch something like this diff --git a/fs/exec.c b/fs/exec.c index 38f63451b928..26293bd7c502 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1934,6 +1934,10 @@ int kernel_execve(const char *kernel_filename, int fd = AT_FDCWD; int retval; + // Make sure CONFIG_SET_FS architectures actually + // do things into user space. + force_uaccess_begin(); + filename = getname_kernel(kernel_filename); if (IS_ERR(filename)) return PTR_ERR(filename); make the warning go away? I really would like to set USER_DS even earlier, but this might be the least disruptive place. Anything that accesses kernel space should *not* depend on KERNEL_DS at this point, since that would make all the properly converted architectures already fail. And any architectures that haven't been converted away from CONFIG_SET_FS would have been hitting that force_uaccess_begin() later in begin_new_exec(), so they can't depend on any KERNEL_DS games after kernel_execve() either. So that one-liner is hacky, but *feels* safe to me. Am I perhaps missing something? It probably means we could remove the force_uaccess_begin() in begin_new_exec() entirely, but let's first see if the one-liner at least makes the warning go away. Linus