> On Mar 25, 2020, at 12:03 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Mar 24, 2020 at 11:24:01PM -0400, Qian Cai wrote: > >>> On Mar 24, 2020, at 10:13 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: >>> >>> On Tue, Mar 24, 2020 at 09:49:48PM -0400, Qian Cai wrote: >>> >>>> It does not catch anything at all with the patch, >>> >>> You mean, oops happens, but neither WARN_ON() is triggered? >>> Lovely... Just to make sure: could you slap the same couple >>> of lines just before >>> if (unlikely(!d_can_lookup(nd->path.dentry))) { >>> in link_path_walk(), just to check if I have misread the trace >>> you've got? >>> >>> Does that (+ other two inserts) end up with >>> 1) some of these WARN_ON() triggered when oops happens or >>> 2) oops is happening, but neither WARN_ON() triggers or >>> 3) oops not happening / becoming harder to hit? >> >> Only the one just before >> if (unlikely(!d_can_lookup(nd->path.dentry))) { >> In link_path_walk() will trigger. > >> [ 245.767202][ T5020] pathname = /var/run/nscd/socket > > Lovely. So > * we really do get NULL nd->path.dentry there; I've not misread the > trace. > * on the entry into link_path_walk() nd->path.dentry is non-NULL. > * *ALL* components should've been LAST_NORM ones > * not a single symlink in sight, unless the setup is rather unusual > * possibly not even a single mountpoint along the way (depending > upon the userland used) > > And in the same loop we have > if (likely(type == LAST_NORM)) { > struct dentry *parent = nd->path.dentry; > nd->flags &= ~LOOKUP_JUMPED; > if (unlikely(parent->d_flags & DCACHE_OP_HASH)) { > struct qstr this = { { .hash_len = hash_len }, .name = name }; > err = parent->d_op->d_hash(parent, &this); > if (err < 0) > return err; > hash_len = this.hash_len; > name = this.name; > } > } > upstream of that thing. So NULL nd->path.dentry *there* would've oopsed. > IOW, what we are hitting is walk_component() with non-NULL nd->path.dentry > when we enter it, NULL being returned and nd->path.dentry becoming NULL > by the time we return from walk_component(). > > Could you post the results of > stat / /var /var/run /var/run/nscd /var/run/nscd/socket The file is gone after a successful boot, # stat / /var /var/run /var/run/nscd /var/run/nscd/socket File: / Size: 244 Blocks: 0 IO Block: 65536 directory Device: fe00h/65024d Inode: 128 Links: 17 Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-03-24 14:21:27.112559236 -0400 Modify: 2020-03-24 14:21:25.840486593 -0400 Change: 2020-03-24 14:21:25.840486593 -0400 Birth: - File: /var Size: 4096 Blocks: 8 IO Block: 65536 directory Device: fe00h/65024d Inode: 133 Links: 21 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-08-12 05:57:57.000000000 -0400 Modify: 2020-03-23 21:29:31.087264900 -0400 Change: 2020-03-23 21:29:31.087264900 -0400 Birth: - File: /var/run -> ../run Size: 6 Blocks: 0 IO Block: 65536 symbolic link Device: fe00h/65024d Inode: 143 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-03-24 17:34:11.865030724 -0400 Modify: 2020-03-23 17:16:40.573974805 -0400 Change: 2020-03-23 17:16:40.573974805 -0400 Birth: - stat: cannot stat '/var/run/nscd': No such file or directory stat: cannot stat '/var/run/nscd/socket': No such file or directory > after the boot with working kernel? Also, is that "hit on every boot" or > stochastic? If it's the latter, I'd like to see the output of the same > thing on a successful boot of the same kernel, if possible... It does not hit every time, so I used a cron job, @reboot sleep 180; systemctl reboot It has always hit it within a hour so far.