On Tue, Sep 03, 2019 at 10:44:32PM +0800, zhengbin (A) wrote: > We recently encountered an oops(the filesystem is tmpfs) > crash> bt > #9 [ffff0000ae77bd60] dcache_readdir at ffff0000672954bc > > The reason is as follows: > Process 1 cat test which is not exist in directory A, process 2 cat test in directory A too. > process 3 create new file in directory B, process 4 ls directory A. good grief, what screen width do you have to make the table below readable? What I do not understand is how the hell does your dtry2 manage to get actually freed and reused without an RCU delay between its removal from parent's ->d_subdirs and freeing its memory. What should've happened in that scenario is * process 4, in next_positive() grabs rcu_read_lock(). * it walks into your dtry2, which might very well be just a chunk of memory waiting to be freed; it sure as hell is not positive. skipped is set to true, 'i' is not decremented. Note that ->d_child.next points to the next non-cursor sibling (if any) or to the ->d_subdir of parent, so we can keep walking. * we keep walking for a while; eventually we run out of counter and leave the loop. Only after that we do rcu_read_unlock() and only then anything observed in that loop might be freed and reused. Confused... OTOH, I might be misreading that table of yours - it's about 30% wider than the widest xterm I can get while still being able to read the font...