On 10/22/20 5:38 PM, Jeroen Roovers wrote: > On Wed, 21 Oct 2020 08:07:15 +0200 > Helge Deller <deller@xxxxxx> wrote: > >> On 10/20/20 7:21 PM, Jeroen Roovers wrote: >>> On Sat, 29 Aug 2020 14:20:17 +0200 >>> Helge Deller <deller@xxxxxx> wrote: >>> >>>> HPUX has separate NDELAY & NONBLOCK values. In the past we wanted >>>> to be able to run HP-UX binaries natively on parisc Linux which is >>>> why we defined O_NONBLOCK to 000200004 to distinguish NDELAY & >>>> NONBLOCK bits. >>>> But with 2 bits set in this bitmask we often ran into compatibility >>>> issues with other Linux applications which often only test one bit >>>> (or even compare the values). >>>> >>>> To avoid such issues in the future, this patch changes O_NONBLOCK >>>> to become 000200000. That way old programs will still be >>>> functional, and for new programs we now have only one bit set. >>> >>> I am seeing a problem with this exact commit in userland, so I think >>> that last sentence is incorrect: >> >> Thanks for testing and bisecting!!! >> >> I'm fine with reverting the change, but we really need to >> analyze what is broken (and why). >> >> In general the kernel sources seem ok as it's important, >> that code just check if bits are set, not if the value >> is equal to something e.g. >> good: if (flags & O_NONBLOCK) { ... } >> bad: if (flags == O_NONBLOCK) { .... } >> >> >>> The first sign (in the boot process) that something is wrong is that >>> idmapd fails to start: >>> >>> * Starting idmapd ... >>> * make sure DNOTIFY support is enabled ... >>> [ !! ] >>> * ERROR: rpc.idmapd failed to start >>> * ERROR: cannot start nfsclient as rpc.idmapd would not start >> >> Could you try an strace on it? > > [after editing the startup script to run `strace -f .. rpc.idmapd`:] > > https://rooversj.home.xs4all.nl/rpc.idmapd.strace > >> idmapd is from glibc, so I'll look into it too. >> >>> Then, elogind reports an error when I ssh in as regular user: >>> >>> [ 297.825133][ T4273] elogind-daemon[4273]: Failed to register >>> SIGHUP handler: Invalid argument >>> [ 297.825133][ T4273] elogind-daemon[4273]: Failed to register >>> SIGHUP handler: Invalid argument [ 298.040379][ T4273] >>> elogind-daemon[4273]: Failed to fully start up daemon: Invalid >>> argument [ 298.040379][T4273] elogind-daemon[4273]: Failed to >>> fully start up daemon: Invalid argument > > strace -f -o /tmp/elogind.strace /lib/elogind/elogind > > https://rooversj.home.xs4all.nl/elogind.strace > >>> >>> Yet the unprivileged user succeeds in logging in over SSH. Following >>> that, sudo fails: >>> >>> jeroen@karsten ~ $ sudo -i >>> sudo: unable to allocate memory >>> >>> root can still login on the serial console and over SSH. >> >> At first thought I assume those issues are not related to the >> O_NONBLOCK patch. Can you try strace on the sudo too ? > > strace -f -u jeroen sudo -i > [...] > pipe2(0x42f4712c, O_NONBLOCK|O_CLOEXEC) = -1 EINVAL (Invalid argument) > openat(AT_FDCWD, 0xfadd9b50, O_RDONLY|O_CLOEXEC) = 3 > fstat64(3, 0xfadd9e88) = 0 > read(3, 0x42f47258, 4096) = 2998 > read(3, "", 4096) = 0 > close(3) = 0 > openat(AT_FDCWD, 0x42f46bc0, O_RDONLY) = -1 ENOENT (No such file or > directory) openat(AT_FDCWD, 0x42f46f48, O_RDONLY) = -1 ENOENT (No such > file or directory) openat(AT_FDCWD, 0x42f46fc8, O_RDONLY) = -1 ENOENT > (No such file or directory) openat(AT_FDCWD, 0x42f46f78, O_RDONLY) = > -1 ENOENT (No such file or directory) ioctl(2, TCGETS, 0xfadd9c08) > = 0 write(2, 0xfadd7f46, 4sudo) = 4 > ioctl(2, TCGETS, 0xfadd9c08) = 0 > write(2, 0xf8151e94, 2: ) = 2 > ioctl(2, TCGETS, 0xfadd9c08) = 0 > write(2, 0xfadd9608, 25unable to allocate memory) = 25 > ioctl(2, TCGETS, 0xfadd9c08) = 0 > write(2, 0x42e39934, 2 > ) = 2 > exit_group(1) = ? > +++ exited with 1 +++ > > https://rooversj.home.xs4all.nl/sudo-i.strace Thanks! I found the issue. The syscalls timerfd_create(), signalfd4(), eventfd2(), userfaultfd() and pipe2() have a fcntl flags parameter which is checked hard. They return EINVAL and as such the programs fail. I found systemd-udevd and udevadm failing because of this. sudo and elogind seem affected too. I'm sending a RFC patch in a few minutes. Helge