On Tue, Sep 03, 2019 at 04:40:07PM +0100, Al Viro wrote: > On Tue, Sep 03, 2019 at 10:44:32PM +0800, zhengbin (A) wrote: > > We recently encountered an oops(the filesystem is tmpfs) > > crash> bt > > #9 [ffff0000ae77bd60] dcache_readdir at ffff0000672954bc > > > > The reason is as follows: > > Process 1 cat test which is not exist in directory A, process 2 cat test in directory A too. > > process 3 create new file in directory B, process 4 ls directory A. > > > good grief, what screen width do you have to make the table below readable? > > What I do not understand is how the hell does your dtry2 manage to get actually > freed and reused without an RCU delay between its removal from parent's > ->d_subdirs and freeing its memory. What should've happened in that > scenario is > * process 4, in next_positive() grabs rcu_read_lock(). > * it walks into your dtry2, which might very well be > just a chunk of memory waiting to be freed; it sure as hell is > not positive. skipped is set to true, 'i' is not decremented. > Note that ->d_child.next points to the next non-cursor sibling > (if any) or to the ->d_subdir of parent, so we can keep walking. > * we keep walking for a while; eventually we run out of > counter and leave the loop. > > Only after that we do rcu_read_unlock() and only then anything > observed in that loop might be freed and reused. > > Confused... OTOH, I might be misreading that table of yours - > it's about 30% wider than the widest xterm I can get while still > being able to read the font... Incidentally, which kernel was that on?