On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote: > Hi Al (and others), > > I wonder if you could look over this patchset. > It allows RCU-walk to follow symlinks in many common cases, > thus removing a surprising performance hit caused by using symlinks. > > The last could of patches make changes to XFS and NFS to support > this but I haven't forwarded to the relevant lists yet. > If/when the early code meets with approval I'll do that. > > The first patch almost certainly needs to be changed. I originally > wrote this code when filesystems could see inside nameidata. > It is now opaque so the simplest solution was to provide an > accessor function. > Maybe I should as a 'flags' arg to ->follow_link?? Or have > ->follow_link and ->follow_link_rcu ?? > What do you suggest? Umm... Some observations: * now ->follow_link() can be called in RCU mode, which means that it can race with fs shutdown; not a problem, except that now it joins ->lookup() et.al. in "if some data structure is needed in RCU case of that, make sure it's not destroyed without an RCU delay somewhere between the entry into ->kill_sb() and destruction. * highmem pages in symlinks: that BS shouldn't be allowed at all. Just make sure that at least for those filesystems symlink inodes get mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) and be done with that. * are you sure that security_inode_follow_link() is OK to call in RCU mode? * what warranties are you giving for the lifetime of strings passed to nd_set_link()? Right now it's "should not be freed until the matching ->put_link()"; what happens for RCU mode? * really nasty one: creat(2) on a dangling symlink. What's to preserve the last component if you get into that symlink in RCU mode? TBH, I'm less than fond of passing nameidata to ->follow_link() at all, flags or no flags. We could kill current->link_count and current->total_link_count, replacing them with one void * current->nameidata and taking counters into struct nameidata itself. Have places like e.g. kern_path_locked() do struct nameidata nd, *saved = set_nameidata(&nd); ... set_nameidata(saved); with set_nameidata(p) doing this: old = current->nameidata; current->nameidata = p; if (p) { if (!old) { p->link_count = 0; p->total_link_count = 0; } else { p->link_count = old->link_count; p->total_link_count = old->total_link_count; } } return old; Then nd_set_link() et.al. would use current->nameidata instead of an explicitly passed pointer and ->follow_link() instances wouldn't need that opaque pointer passed to them at all. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html