Re: [RFC PATCH 2/3] vfs: Add open by file handle support

"Aneesh Kumar K. V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> · Mon, 22 Feb 2010 11:43:43 +0530

On Sat, 20 Feb 2010 11:58:34 -0700, Andreas Dilger <adilger@xxxxxxx> wrote:
> On 2010-02-18, at 22:42, Aneesh Kumar K.V wrote:
> > +static struct dentry *handle_to_dentry(struct path *path,
> > +				struct file_handle *fh)
> > +{
> > +	inode = path->dentry->d_inode;
> > +	if (!S_ISREG(inode->i_mode) &&
> > +		!S_ISDIR(inode->i_mode) &&
> > +		!S_ISLNK(inode->i_mode)) {
> 
> Please follow normal indenting rules, like:
> 
> 	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
> 	    !S_ISLNK(inode->i_mode)) {
> 
> > +	/* should we do some check on handle size ?*/
> > +	handle = kmalloc(handle_size, GFP_KERNEL);
> 
> Good question.  kmalloc() allows up to 32MB allocations, so a user  
> running 1024 threads calling this with bad handles could OOM most  
> systems today due to unswappable kernel allocations.  Should there be  
> an upper limit on the handle size, like 4096 bytes or so?  Should some  
> filesystem legitimately need a larger size the kernel can always be  
> changed without problems since the size isn't part of the ABI (it  
> should be an internal sanity check), but I can't imagine what needs to  
> be put into a file handle that would exceed this size.

At this point i will limit it to 4096 bytes.

> 
> > +	dentry = exportfs_decode_fh(path->mnt, (struct fid *)handle,
> > +					handle_size, fh->handle_type,
> > +					vfs_dentry_acceptable, NULL);
> 
> One serious problem I can see with this is that the handle only  
> encodes the ino+generation of the inode, and nothing about the device  
> itself.  NFS handles this by using the fsid to identify the  
> filesystem, but it would be highly confusing to applications if a  
> process with a different CWD gets a different file or an error trying  
> to open the file handle, or if a new filesystem gets mounted and the  
> file handle stops working.  Having the file handle able to positively  
> identify a single inode in the system is the most important property  
> it has, I think.
> 
> Probably one of the important use cases of open_by_handle() is to have  
> one process doing pathname resolution and slave threads doing work on  
> those handles.  Usually daemon threads change their working directory  
> to "/" to avoid pinning some directory.  Even without that, it would  
> be impossible for a process to create handles in 2 different  
> filesystems, which is totally broken:
> 
>           (CWD is /home/adilger)
>           name_to_handle("/usr/src/linux/COPYING", &fh);
> 
>           fd = open_by_handle(&fh, O_RDONLY);
>           (fd = -1; errno = ENOENT or EBADF or whatever)
> 

do_sys_open_by_handle actually takes an fd. I was planning to do an openat
style syscall also. That should handle the above problem. In case of the
new syscall dirfd will be a file descriptor for a file in the
particular filesystem.

> Without the handle identifying the filesystem in some way this is  
> mostly useless.  Encoding the device number would be a simple (though  
> non-robust) way to do this, a better way would be with the filesystem  
> UUID and adding an in-kernel UUID->sb mapping function (which is  
> useful for other things as well).
> 

But i understand this is an good requirement. I will see what best I
can do.

> > +long do_sys_open_by_handle(int dfd, struct file_handle *fh, int  
> > flags)
> > +{
> > +	if (!capable(CAP_SYS_ADMIN))
> > +		/* Allow open by handle only by sysadmin */
> > +		return -EPERM;
> 
> Hmm, I guess this avoids some of the security concerns, but makes it a  
> lot less useful.  I was thinking this could be used for e.g. user NFS  
> serving or such, but if it is limited to root only then you might as  
> well just set up the in-kernel NFSd.  By making the handle hard to  
> forge (e.g. generate random key per superblock, sha1(ino+gen+key) and  
> store that into fh; someone with more security experience can think of  
> a better scheme) then you can reasonably safely dispense with the  
> CAP_SYS_ADMIN check because you can be sure that the proper path  
> traversal has been done by a trusted process and there is no more  
> exposure than unix socket fd passing.

If we make the handle depend on the random key, how do we make sure
that after a server crash when get the same inode with the same handle.
That would be a requirement for an NFS server right? We can definitely
encode device details or UUID details the same way nfs server does in
fh_compose. But i guess making it random. 

What are the issues of being able to guess the file handle ? May be
there are other ways to handle those.

> 
> > +	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> > +		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
> 
> Please use normal indenting style, aligning on the parenthesis.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html