RE: [RFC] ->encode_fh() and related breakage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Al Viro [mailto:viro@xxxxxxxxxxxxxxxxxx]
> Sent: Thursday, December 29, 2011 11:27 PM
> To: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: Linus Torvalds; Sage Weil; Dan Magenheimer
> Subject: [RFC] ->encode_fh() and related breakage
> 
> 	a) cleancache is abusing ->encode_fh().  Badly.  In particular,
> on-stack(!) struct dentry is a lousy idea.

Indeed. See:  https://lkml.org/lkml/2011/2/24/378 
Chris Mason (now cc'ed) suggested (offlist) this short-term hack, which
was a fix to a reported issue** from a user of btrfs+cleancache+zcache
and the above lkml post was intended to log the hack for later review.

Thanks, Al, for picking up the ball I dropped and neglected.

** https://lkml.org/lkml/2011/2/13/163 
 
> 	c) cleancache relies on (unguaranteed) property of ->encode_fh() -
> if the last argument is 0, most of the instances do not look at anything
> other than dentry->d_inode.  Guess what happens with ceph?  Right, we
> end up doing spin_lock(&d.d_lock) and dget(d.d_parent).  Even though
> struct dentry d is auto, has no initializer and the only thing done to
> it is assignment to d.d_inode.  IOW, that spinlock has undefined contents
> and in case if you happen to pass that spin_lock(), dget(random_pointer)
> will get you an increment of value in memory at random address.  Fun...
> Moreover, on FAT we get dereference (read-only, but still it's at least an
> oops) of uninitialized pointer - d.d_parent->d_inode is followed.

Fortunately, cleancache is per-filesystem opt-in so that encode_fh
call from cleancache_get_key (in mm/cleancache.c) would never occur
for those filesystems.

> 	e) in case of default encode_fh, cleancache gets one difference from
> exportfs_encode_fh() behaviour - it ignores ->i_generation.  Note that for
> non-default encode_fh() i_generation is *not* ignored.
> 
> 	f) in case of _no_ export operations being present, cleancache is
> basically praying for inumbers being stable for some unknown reason.  Which
> is about as efficient as prayers tend to be...

What cleancache is looking for is a persistent 192-bit file identifier
that is unique and reproducible, regardless of whether the dentry is in-cache.
There should be no way for that unique identifier to become incoherent
with pages on disk belonging to that inode unless the inode is
in memory and cleancache_flush/invalidate_inode is called. Any fs
that cannot guarantee this uniqueness and coherency should not enable
cleancache, which is a good reason why cleancache is per-fs opt-in.

The use case (in case you are baffled):  Think of cleancache as being
huge, NOT part of the main memory of the system and so NOT subject
to normal system memory pressure.  We want the data pages to remain
in cleancache as long as possible, regardless of whether the inode
or dentry are in-cache.

> 	One kinda-sorta solution (besides kicking cleancache out of tree)

8-/

> would be to modify ->encode_fh() API - instead of dentry + connectable
> pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.

> -static int export_encode_fh(struct dentry *dentry, struct fid *fid,
> -		int *max_len, int connectable)
> +static int export_encode_fh(struct inode *inode, struct fid *fid,
> +		int *max_len, struct inode *parent)

I won't pretend to be worthy to comment on the rest of the patch
or the change-of-API issues but FWIW this API change and the matching
cleancache changes below:

Acked-by: Dan Magenheimer <dan.magenheimer.com>

> diff --git a/mm/cleancache.c b/mm/cleancache.c
> index bcaae4c..668c74f 100644
> --- a/mm/cleancache.c
> +++ b/mm/cleancache.c
> @@ -75,7 +75,7 @@ EXPORT_SYMBOL(__cleancache_init_shared_fs);
>  static int cleancache_get_key(struct inode *inode,
>  			      struct cleancache_filekey *key)
>  {
> -	int (*fhfn)(struct dentry *, __u32 *fh, int *, int);
> +	int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *);
>  	int len = 0, maxlen = CLEANCACHE_KEY_MAX;
>  	struct super_block *sb = inode->i_sb;
> 
> @@ -83,9 +83,7 @@ static int cleancache_get_key(struct inode *inode,
>  	if (sb->s_export_op != NULL) {
>  		fhfn = sb->s_export_op->encode_fh;
>  		if  (fhfn) {
> -			struct dentry d;
> -			d.d_inode = inode;
> -			len = (*fhfn)(&d, &key->u.fh[0], &maxlen, 0);
> +			len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL);
>  			if (len <= 0 || len == 255)
>  				return -1;
>  			if (maxlen > CLEANCACHE_KEY_MAX)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux