Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Tue, 21 Apr 2015 22:20:07 +0100

On Tue, Apr 21, 2015 at 04:45:04PM +0100, Al Viro wrote:
> On Tue, Apr 21, 2015 at 05:12:01PM +0200, Richard Weinberger wrote:
> 
> > I'm pretty sure we can kill it. I had the plan to rip it out during this merge window
> > along with other broken UML stuff but I was too late to ask on the UML mailinglist
> > if someone is using it (which I really doubt).
> > So, let's kill it with v4.2.
> 
> Let's do it.  Then ->put_link() is left in an interesting situation - *all*
> instances only use the 'cookie' argument...

OK, so here's what we have:
	* a lot of filesystems are using page_follow_link_light(); for RCU
mode they should simply look for page and if it's there and uptodate, that's
it - just grab a reference and be done.  If it's not uptodate - oh, well,
fallback to non-RCU mode.  Corresponding ->put_link() doesn't give a damn
which inode or dentry it is - it's just page_cache_release() (we need to
get rid of that kmap() crap anyway).
	* a lot of fast symlinks are using only inode; no ->put_link(),
no blocking operations, etc.  No problem at all.
	* shmem would probably want something similar to what
page_follow_link_light() would be doing for RCU case.
	* befs: should switch to page_follow_link_light(); just a matter of
giving it proper ->readpage().
	* NFS: probably as in Neil's series, except that we really ought to
add a helper for what page_follow_link_light() would do in RCU case, rather
than open-coding it here (and, again, kmap/kunmap crap should go)
	* /proc/self and its per-thread ilk: just do GFP_ATOMIC allocation for
RCU case (and handle failure as -ECHILD rather than usual -ENOMEM).
	* proc_symlink() stuff: uses only inode, nothing blocking, no problem.
	* 9p, cifs and fuse: those always query server on ->follow_link();
-ECHILD and be done with that.  _IF_ they want some kind of caching, they can
do as NFS does.  hostfs is that way too.
	* gfs2: _probably_ want to bugger off with -ECHILD; OTOH, ocfs2
uses page_follow_link_light(), maybe correctly, maybe not, and it ought
to have similar issues...
	* kernfs, configfs: -ECHILD.  And git rm is _very_ tempting after
reading that code...
	* lustre: hell knows, maybe always -ECHILD, maybe something like NFS.
	* XFS: see above.
	* hppfs: agreed to kill it off
	* autofs: not sure; it would be almost the usual fast symlink, if not
for the fact that it marks an object reached from dentry as "used now".
With RCU pathwalk it's _probably_ harmless, but I'd like a confirmation from
autofs folks.
	* /proc/*/ns/*: in theory, we might make it handle RCU mode, but
it's probably easier to say "just bugger off"
	* /proc/*/fd/*, /proc/*/exe, /proc/*/cwd, /proc/*/root: in principle
doable, but not without serious massage.
	* /proc/*/map_files/*: -ECHILD.
	* overlayfs: usual "use GFP_ATOMIC in RCU mode, treat failures as
-ECHILD".
	* ecryptfs: -ECHILD (and its use of ->readlink() is fishy, IMO).

I agree that unlazy_walk() attempted when walking a symlink ought to fail
with -ECHILD; we can't legitimize the symlink itself, so once we are out
of RCU mode, there's nothing to hold the inode of symlink (and its body)
from getting freed.  Solution is wrong though; for example, when
nested symlink occurs in the middle of a trailing one, we should *not*
remove the flag upon leaving the nested symlink.

Another unpleasant thing is that ->follow_link() saying "can't do that in
RCU mode" ends up with restart from scratch - that actually risks to be
worse than the mainline; there we would attempt unlazy_walk() and normally
it would've succeed.

AFAICS, the real rule is "can't unlazy if nd->last.name points into a symlink
body and we might still need to access it"...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html