Re: incorrect EXDEV error occasionally returned for rename over NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 17, 2010 at 05:39:01PM -0600, Joe Habermann wrote:
> OK, so I think I have an approach that may work and not require major
> refactoring.  The basic idea is to maintain a sequence number for the
> svc_export_cache.  We snap the sequence at the top of nfsd_rename.
> And we snap it again after we've called fh_verify for the "to" directory.
> If the sequence number changed, then the cache changed and trying
> to compare the export pointers can lead to false positives or false negatives.
> In this case, throw away export and the dentry information and start over.
> 
> Other changes:
> 1) got rid of "inuse" in cache_detail.  It really wasn't being used.
> 2) Remove check for ex_path.mnt near the bottom of nfsd_rename.
> Can't we assert that ffhp->fh_export == tfhp->fh_export?
> 3) Threw in a stat to see how often we have to retry ("rename_retries")
> 
> I'm testing with this now and so far so good but
> comments/corrections/suggestions
> welcome since I'm very new to this code.

Hm, thanks, that looks like an approach that should work.

It sure would be simpler, though if we could just solve the problem by
replacing the comparison of export pointers by a better comparison....

I understand your concern about svc_export_match--I'd need to reread
some code to figure out if we can have duplicate auth_domains in the
same way we can with exports--but if so we can always just do a strcmp()
of the ex_client->name's instead.

How about trying that?  Just define a small helper function that does
those checks, and call it from rename in place of the export
comparisons?

--b.

> 
> Thanks, -Joe
> 
> --- ../linux-2.6.32.12-0.7/./include/linux/sunrpc/cache.h
> 2010-05-20 05:06:54.000000000 -0500
> +++ ./include/linux/sunrpc/cache.h      2011-02-11 16:04:19.000000000 -0600
> @@ -74,7 +74,8 @@ struct cache_detail {
>         struct cache_head **    hash_table;
>         rwlock_t                hash_lock;
> 
> -       atomic_t                inuse; /* active user-space update or lookup */
> +       atomic_t                seqno; /* bumped when cache changes
> +                                       * not used by all caches */
> 
>         char                    *name;
>         void                    (*cache_put)(struct kref *);
> --- ../linux-2.6.32.12-0.7/./net/sunrpc/cache.c 2010-05-20
> 05:06:54.000000000 -0500
> +++ ./net/sunrpc/cache.c        2011-02-11 20:03:55.000000000 -0600
> @@ -96,6 +96,7 @@ struct cache_head *sunrpc_cache_lookup(s
>         *head = new;
>         detail->entries++;
>         cache_get(new);
> +       atomic_inc(&detail->seqno);
>         write_unlock(&detail->hash_lock);
> 
>         return new;
> @@ -141,6 +142,7 @@ struct cache_head *sunrpc_cache_update(s
>                         cache_fresh_locked(old, new->expiry_time);
>                         write_unlock(&detail->hash_lock);
>                         cache_fresh_unlocked(old, detail);
> +                       atomic_inc(&detail->seqno);
>                         return old;
>                 }
>                 write_unlock(&detail->hash_lock);
> @@ -170,6 +172,7 @@ struct cache_head *sunrpc_cache_update(s
>         cache_fresh_unlocked(tmp, detail);
>         cache_fresh_unlocked(old, detail);
>         cache_put(old, detail);
> +       atomic_inc(&detail->seqno);
>         return tmp;
>  }
>  EXPORT_SYMBOL_GPL(sunrpc_cache_update);
> @@ -241,11 +244,13 @@ int cache_check(struct cache_detail *det
>                                         cache_fresh_unlocked(h, detail);
>                                         rv = -ENOENT;
>                                 }
> +                               atomic_inc(&detail->seqno);
>                                 break;
> 
>                         case -EAGAIN:
>                                 clear_bit(CACHE_PENDING, &h->flags);
>                                 cache_revisit_request(h);
> +                               atomic_inc(&detail->seqno); /* not needed? */
>                                 break;
>                         }
>                 }
> @@ -327,7 +332,7 @@ static void sunrpc_destroy_cache_detail(
>         cache_purge(cd);
>         spin_lock(&cache_list_lock);
>         write_lock(&cd->hash_lock);
> -       if (cd->entries || atomic_read(&cd->inuse)) {
> +       if (cd->entries) {
>                 write_unlock(&cd->hash_lock);
>                 spin_unlock(&cache_list_lock);
>                 goto out;
> --- ../linux-2.6.32.12-0.7/./fs/nfsd/vfs.c      2010-05-20
> 05:06:54.000000000 -0500
> +++ ./fs/nfsd/vfs.c     2011-02-11 16:03:24.000000000 -0600
> @@ -1717,6 +1717,8 @@ out_nfserr:
>         goto out_unlock;
>  }
> 
> +static atomic_t rename_retries = ATOMIC_INIT(0);
> +
>  /*
>   * Rename a file
>   * N.B. After this call _both_ ffhp and tfhp need an fh_put
> @@ -1729,14 +1731,33 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>         struct inode    *fdir, *tdir;
>         __be32          err;
>         int             host_err;
> +       int             seqsav;
> 
> +retry:
> +       seqsav = atomic_read(&svc_export_cache.seqno);
>         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
>         if (err)
>                 goto out;
>         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
>         if (err)
>                 goto out;
> 
> +       /* if cache changed, the fh_export pointer comparison
> +        * below is not valid, in general */
> +       if (unlikely(seqsav != atomic_read(&svc_export_cache.seqno))) {
> +               cache_put(&ffhp->fh_export->h, &svc_export_cache);
> +               ffhp->fh_export = NULL;
> +               cache_put(&tfhp->fh_export->h, &svc_export_cache);
> +               tfhp->fh_export = NULL;
> +               dput(ffhp->fh_dentry);
> +               ffhp->fh_dentry = NULL;
> +               dput(tfhp->fh_dentry);
> +               tfhp->fh_dentry = NULL;
> +               atomic_inc(&rename_retries);
> +               goto retry;
> +       }
> +
>         fdentry = ffhp->fh_dentry;
>         fdir = fdentry->d_inode;
> 
> @@ -1785,9 +1806,6 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>                         goto out_dput_new;
>         }
> 
> -       host_err = -EXDEV;
> -       if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
> -               goto out_dput_new;
>         host_err = mnt_want_write(ffhp->fh_export->ex_path.mnt);
>         if (host_err)
>                 goto out_dput_new;
> 
> 
> On Tue, Nov 16, 2010 at 12:55 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> > On Mon, Nov 15, 2010 at 03:46:44PM -0600, Joe Habermann wrote:
> >> When running SVN checkouts over NFSv3, they occasionally fail with an
> >> EXDEV error:
> >>
> >> (etc.)
> >> U Â Âsnfs/build/Makefile.hpux
> >> svn: In directory 'snfs/build'
> >> svn: Can't move 'snfs/build/tempfile.tmp' to
> >> 'snfs/build/Makefile.sol': Invalid cross-device link
> >>
> >> I've noticed that other people have reported this problem but it remains
> >> unresolved. ÂThrough some debugging, I believe I have found root cause.
> >> nfsd_rename() contains the following code:
> >>
> >> 1707 /*
> >> 1708 Â* Rename a file
> >> 1709 Â* N.B. After this call _both_ ffhp and tfhp need an fh_put 1710 Â*/
> >> 1711 __be32
> >> 1712 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char
> >> *fname, int flen,
> >> 1713 Â Â Â Â Â Â Â Â Â Â Â Â Â Â struct svc_fh *tfhp, char *tname, int tlen)
> >> 1714 {
> >> 1715     struct dentry  *fdentry, *tdentry, *odentry, *ndentry, *trap;
> >> 1716     struct inode  Â*fdir, *tdir;
> >> 1717 Â Â Â Â __be32 Â Â Â Â Âerr;
> >> 1718     int       host_err;
> >> 1719
> >> 1720 Â Â Â Â err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
> >> 1721 Â Â Â Â if (err)
> >> 1722 Â Â Â Â Â Â Â Â goto out;
> >> 1723 Â Â Â Â err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
> >> 1724 Â Â Â Â if (err)
> >> 1725 Â Â Â Â Â Â Â Â goto out;
> >> 1726
> >> 1727 Â Â Â Â fdentry = ffhp->fh_dentry;
> >> 1728 Â Â Â Â fdir = fdentry->d_inode;
> >> 1729
> >> 1730 Â Â Â Â tdentry = tfhp->fh_dentry;
> >> 1731 Â Â Â Â tdir = tdentry->d_inode;
> >> 1732
> >> 1733 Â Â Â Â err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
> >> 1734 Â Â Â Â if (ffhp->fh_export != tfhp->fh_export)
> >> 1735 Â Â Â Â Â Â Â Â goto out;
> >>
> >> Line 1734 compares the fh_export pointers in the file handles for
> >> the "from" and "to" directories. ÂHowever, this is a bad check since
> >> it's possible these pointers differ yet refer to the same underlying
> >> file system. ÂBased on code inspection, encached export pointers are
> >> only valid for 30 minutes before they expire and are reinserted through
> >> calls made between the kernel and rpc.mountd. ÂSo a race exists where the
> >> ffhp->fh_export is populated on line 1720, the export cache is updated,
> >> and tfhp->fh_export is set on line 1723 to a different value.
> >>
> >> This window can be cranked wide open by putting an "msleep(10);"
> >> between lines 1722 and 1723 above, recompiling and reinstalling nfsd.ko
> >> etc., and then running SVN checkouts in a loop from an NFS client.
> >> Within 30 minutes, the problem is seen.
> >
> > Good detective work, thanks. ÂHave you confirmed that applying your
> > change below eliminates the problem?
> >
> >> I'm still very new to the NFS code, but I'm wonder whether the check
> >> needs to be something like this instead:
> >>
> >> --- vfs.c.orig Â2010-11-15 14:35:18.000000000 -0600
> >> +++ vfs.c    2010-11-15 14:43:52.000000000 -0600
> >> @@ -1731,7 +1731,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
> >> Â Âtdir = tdentry->d_inode;
> >>
> >> Â Âerr = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
> >> - Â Â Â if (ffhp->fh_export != tfhp->fh_export)
> >> + Â Â Â if (fdir->i_sb != tdir->i_sb)
> >> Â Â Â Âgoto out;
> >>
> >> Â Âerr = nfserr_perm;
> >>
> >> Thoughts? ÂI think it's either that or some locking needs to be put in
> >> to guarantee that the file handles for the "to" and "from" directory
> >> always wind up with the same export pointer when they belong to the
> >> same file system. ÂI guess the other question is why there is an
> >> EXDEV check at the top of nfsd_rename() instead of putting it in
> >> vfs_rename().
> >
> > The fh_verify call is also responsible for setting the identity of the
> > user, and that may be affected by export options like root_squash. ÂIn
> > the above case if ffhp was on a root_squash export, and tfhp on a
> > no_root_squash export, and we get a request from uid 0, we'd end up
> > with an incoming uid-0 user running as root (since the fh_verify for
> > tfhp was performed last), and might allow renaming from a directory that
> > only root had write permission to.
> >
> > It's a minor hole (especially since it's already easy for a client to
> > circumvent protections that vary from export to export in the default
> > no_subtree_check case). ÂBut we should be able to do better.
> >
> > I'm not sure what to suggest.
> >
> > --b.
> >
> >>
> >> For full disclosure, I'm actually running the SLES11 SP1 kernel,
> >> linux-2.6.32.12-0.7, but I've looked at the latest upstream kernels and
> >> they appear to have the same code and I assume the same issue.
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux