On 10/20/20 7:21 PM, Jeroen Roovers wrote: > On Sat, 29 Aug 2020 14:20:17 +0200 > Helge Deller <deller@xxxxxx> wrote: > >> HPUX has separate NDELAY & NONBLOCK values. In the past we wanted to >> be able to run HP-UX binaries natively on parisc Linux which is why >> we defined O_NONBLOCK to 000200004 to distinguish NDELAY & NONBLOCK >> bits. >> But with 2 bits set in this bitmask we often ran into compatibility >> issues with other Linux applications which often only test one bit (or >> even compare the values). >> >> To avoid such issues in the future, this patch changes O_NONBLOCK to >> become 000200000. That way old programs will still be functional, and >> for new programs we now have only one bit set. > > I am seeing a problem with this exact commit in userland, so I think > that last sentence is incorrect: Thanks for testing and bisecting!!! I'm fine with reverting the change, but we really need to analyze what is broken (and why). In general the kernel sources seem ok as it's important, that code just check if bits are set, not if the value is equal to something e.g. good: if (flags & O_NONBLOCK) { ... } bad: if (flags == O_NONBLOCK) { .... } > The first sign (in the boot process) that something is wrong is that > idmapd fails to start: > > * Starting idmapd ... > * make sure DNOTIFY support is enabled ... > [ !! ] > * ERROR: rpc.idmapd failed to start > * ERROR: cannot start nfsclient as rpc.idmapd would not start Could you try an strace on it? idmapd is from glibc, so I'll look into it too. > Then, elogind reports an error when I ssh in as regular user: > > [ 297.825133][ T4273] elogind-daemon[4273]: Failed to register SIGHUP > handler: Invalid argument > [ 297.825133][ T4273] elogind-daemon[4273]: Failed to register SIGHUP > handler: Invalid argument [ 298.040379][ T4273] elogind-daemon[4273]: > Failed to fully start up daemon: Invalid argument > [ 298.040379][T4273] elogind-daemon[4273]: Failed to fully start up > daemon: Invalid argument > > Yet the unprivileged user succeeds in logging in over SSH. Following > that, sudo fails: > > jeroen@karsten ~ $ sudo -i > sudo: unable to allocate memory > > root can still login on the serial console and over SSH. At first thought I assume those issues are not related to the O_NONBLOCK patch. Can you try strace on the sudo too ? > Would it make sense to rebuild libc against the newer kernel headers? Yes, might make sense, but then my patch isn't compatible. So, I'd like to avoid that. > Or is this an unexpected result from the above commit and would it be > useful to figure out what is going on while the bad kernel is running? As I said, idmapd might be related, the elogind/sudo tests needs checking. Helge