On Thu, 5 Mar 2015 06:05:20 +0000 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote: > > Hi Al (and others), > > > > I wonder if you could look over this patchset. > > It allows RCU-walk to follow symlinks in many common cases, > > thus removing a surprising performance hit caused by using symlinks. > > > > The last could of patches make changes to XFS and NFS to support > > this but I haven't forwarded to the relevant lists yet. > > If/when the early code meets with approval I'll do that. > > > > The first patch almost certainly needs to be changed. I originally > > wrote this code when filesystems could see inside nameidata. > > It is now opaque so the simplest solution was to provide an > > accessor function. > > Maybe I should as a 'flags' arg to ->follow_link?? Or have > > ->follow_link and ->follow_link_rcu ?? > > What do you suggest? > > Umm... Some observations: > * now ->follow_link() can be called in RCU mode, which means > that it can race with fs shutdown; not a problem, except that now it > joins ->lookup() et.al. in "if some data structure is needed in RCU > case of that, make sure it's not destroyed without an RCU delay somewhere > between the entry into ->kill_sb() and destruction. > * highmem pages in symlinks: that BS shouldn't be allowed at > all. Just make sure that at least for those filesystems symlink inodes > get mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) and be done with that. > * are you sure that security_inode_follow_link() is OK to call in > RCU mode? > * what warranties are you giving for the lifetime of strings > passed to nd_set_link()? Right now it's "should not be freed until the > matching ->put_link()"; what happens for RCU mode? > * really nasty one: creat(2) on a dangling symlink. What's to > preserve the last component if you get into that symlink in RCU mode? > > TBH, I'm less than fond of passing nameidata to ->follow_link() at all, > flags or no flags. We could kill current->link_count and > current->total_link_count, replacing them with one void * current->nameidata > and taking counters into struct nameidata itself. Have places like e.g. > kern_path_locked() do > struct nameidata nd, *saved = set_nameidata(&nd); > ... > set_nameidata(saved); > with set_nameidata(p) doing this: > old = current->nameidata; > current->nameidata = p; > if (p) { > if (!old) { > p->link_count = 0; > p->total_link_count = 0; > } else { > p->link_count = old->link_count; > p->total_link_count = old->total_link_count; > } > } > return old; > > Then nd_set_link() et.al. would use current->nameidata instead of an > explicitly passed pointer and ->follow_link() instances wouldn't need > that opaque pointer passed to them at all. Wow, thanks for the quick response!!! I'll look into all those issues and get back to you when I have something coherent to say. NeilBrown
Attachment:
pgpcqDyR4LlRH.pgp
Description: OpenPGP digital signature