On Tue, Sep 24, 2019 at 09:55:06AM -0700, Linus Torvalds wrote: > [ Sorry for html, I'm driving around ] > > On Mon, Sep 23, 2019, 19:52 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > > Argh... The things turned interesting. The tricky part is > > where do we handle switching cursors away from something > > that gets moved. > > > > I forget why we do that. Remind me? > > Anyway, I think to a first approximation, we should probably first just see > if the "remove cursors from the end" change already effectively makes most > of the regression go away. > > Because with that change, if you just readdir() things with a sufficiently > big buffer, you'll never have the cursor hang around over several system > calls. > > Before, you'd do a readdir, add the cursor, return to user space, then do > another readdir just to see the EOF, and then do the close(). > > So it used to leave the cursor in place over those two final system calls, > and now it's all gone. > > I'm sure that on the big systems you can still trigger the whole d_lock > contention on the parent, but I wonder if it's all that much of a problem > any more in practice with your other change.. If nothing else, it's DoSable right now. And getting rid of that would reduce the memory footprint, while we are at it. In any case, it looks like btrfs really wants an empty directory there, i.e. the right thing to do would be simple_lookup() for ->lookup. CIFS is potentially trickier. AFAICS, what's going on is * Windows has a strange export, called IPC$. Looks like it was (still is?) the place where they export their named pipes. From what I'd been able to figure out, it's always there and allows for some other uses - it can be used to get the list of exports. Whether the actual named pipes are seen there these days... no idea. * there seems to be nothing to prevent a server (modified samba, for example) from exporting whatever it wants under that name. * IF it can be non-empty, mounting it will end up with root directory where we can do lookups for whatever is there. getdents() on that root will show what's currently in dcache (== had been looked up and still has not been evicted by memory pressure). Mainline can get seriously buggered if dcache_readdir() steps into something that is going away. With the patches in this series that's no longer a problem. HOWEVER, if lookup in one subdirectory picks an alias for another subdirectory of root, we will be really screwed - shared lock on parent won't stop d_splice_alias() from moving the alias, and that can bloody well lead to freeing the original name. From under copy_to_user()... And grabbing a reference to dentry obviously doesn't prevent that - dentry itself won't get freed, but external name very well might be. Again, I don't know CIFS well enough to tell how IPC$ is really used. If it doesn't normally contain anything but named pipes, we can simply have cifs_lookup() refuse to return subdirectories on such mounts, with solves any problems with d_splice_alias(). If it's always empty - better yet. If the impressions above (and that's all those are) are correct, we might have a problem in mainline with bogus/malicious servers. That's independent from what we do with the cursors. I asked around a bit; no luck so far. It would be really nice if some CIFS folks could give a braindump on that thing...