On Tue, Mar 24, 2020 at 11:24:01PM -0400, Qian Cai wrote: > > On Mar 24, 2020, at 10:13 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > On Tue, Mar 24, 2020 at 09:49:48PM -0400, Qian Cai wrote: > > > >> It does not catch anything at all with the patch, > > > > You mean, oops happens, but neither WARN_ON() is triggered? > > Lovely... Just to make sure: could you slap the same couple > > of lines just before > > if (unlikely(!d_can_lookup(nd->path.dentry))) { > > in link_path_walk(), just to check if I have misread the trace > > you've got? > > > > Does that (+ other two inserts) end up with > > 1) some of these WARN_ON() triggered when oops happens or > > 2) oops is happening, but neither WARN_ON() triggers or > > 3) oops not happening / becoming harder to hit? > > Only the one just before > if (unlikely(!d_can_lookup(nd->path.dentry))) { > In link_path_walk() will trigger. > [ 245.767202][ T5020] pathname = /var/run/nscd/socket Lovely. So * we really do get NULL nd->path.dentry there; I've not misread the trace. * on the entry into link_path_walk() nd->path.dentry is non-NULL. * *ALL* components should've been LAST_NORM ones * not a single symlink in sight, unless the setup is rather unusual * possibly not even a single mountpoint along the way (depending upon the userland used) And in the same loop we have if (likely(type == LAST_NORM)) { struct dentry *parent = nd->path.dentry; nd->flags &= ~LOOKUP_JUMPED; if (unlikely(parent->d_flags & DCACHE_OP_HASH)) { struct qstr this = { { .hash_len = hash_len }, .name = name }; err = parent->d_op->d_hash(parent, &this); if (err < 0) return err; hash_len = this.hash_len; name = this.name; } } upstream of that thing. So NULL nd->path.dentry *there* would've oopsed. IOW, what we are hitting is walk_component() with non-NULL nd->path.dentry when we enter it, NULL being returned and nd->path.dentry becoming NULL by the time we return from walk_component(). Could you post the results of stat / /var /var/run /var/run/nscd /var/run/nscd/socket after the boot with working kernel? Also, is that "hit on every boot" or stochastic? If it's the latter, I'd like to see the output of the same thing on a successful boot of the same kernel, if possible... Also, is the pathname always the same and if not, what other variants have been observed?