Re: incorrect EXDEV error occasionally returned for rename over NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OK, so I think I have an approach that may work and not require major
refactoring.  The basic idea is to maintain a sequence number for the
svc_export_cache.  We snap the sequence at the top of nfsd_rename.
And we snap it again after we've called fh_verify for the "to" directory.
If the sequence number changed, then the cache changed and trying
to compare the export pointers can lead to false positives or false negatives.
In this case, throw away export and the dentry information and start over.

Other changes:
1) got rid of "inuse" in cache_detail.  It really wasn't being used.
2) Remove check for ex_path.mnt near the bottom of nfsd_rename.
Can't we assert that ffhp->fh_export == tfhp->fh_export?
3) Threw in a stat to see how often we have to retry ("rename_retries")

I'm testing with this now and so far so good but
comments/corrections/suggestions
welcome since I'm very new to this code.

Thanks, -Joe

--- ../linux-2.6.32.12-0.7/./include/linux/sunrpc/cache.h
2010-05-20 05:06:54.000000000 -0500
+++ ./include/linux/sunrpc/cache.h      2011-02-11 16:04:19.000000000 -0600
@@ -74,7 +74,8 @@ struct cache_detail {
        struct cache_head **    hash_table;
        rwlock_t                hash_lock;

-       atomic_t                inuse; /* active user-space update or lookup */
+       atomic_t                seqno; /* bumped when cache changes
+                                       * not used by all caches */

        char                    *name;
        void                    (*cache_put)(struct kref *);
--- ../linux-2.6.32.12-0.7/./net/sunrpc/cache.c 2010-05-20
05:06:54.000000000 -0500
+++ ./net/sunrpc/cache.c        2011-02-11 20:03:55.000000000 -0600
@@ -96,6 +96,7 @@ struct cache_head *sunrpc_cache_lookup(s
        *head = new;
        detail->entries++;
        cache_get(new);
+       atomic_inc(&detail->seqno);
        write_unlock(&detail->hash_lock);

        return new;
@@ -141,6 +142,7 @@ struct cache_head *sunrpc_cache_update(s
                        cache_fresh_locked(old, new->expiry_time);
                        write_unlock(&detail->hash_lock);
                        cache_fresh_unlocked(old, detail);
+                       atomic_inc(&detail->seqno);
                        return old;
                }
                write_unlock(&detail->hash_lock);
@@ -170,6 +172,7 @@ struct cache_head *sunrpc_cache_update(s
        cache_fresh_unlocked(tmp, detail);
        cache_fresh_unlocked(old, detail);
        cache_put(old, detail);
+       atomic_inc(&detail->seqno);
        return tmp;
 }
 EXPORT_SYMBOL_GPL(sunrpc_cache_update);
@@ -241,11 +244,13 @@ int cache_check(struct cache_detail *det
                                        cache_fresh_unlocked(h, detail);
                                        rv = -ENOENT;
                                }
+                               atomic_inc(&detail->seqno);
                                break;

                        case -EAGAIN:
                                clear_bit(CACHE_PENDING, &h->flags);
                                cache_revisit_request(h);
+                               atomic_inc(&detail->seqno); /* not needed? */
                                break;
                        }
                }
@@ -327,7 +332,7 @@ static void sunrpc_destroy_cache_detail(
        cache_purge(cd);
        spin_lock(&cache_list_lock);
        write_lock(&cd->hash_lock);
-       if (cd->entries || atomic_read(&cd->inuse)) {
+       if (cd->entries) {
                write_unlock(&cd->hash_lock);
                spin_unlock(&cache_list_lock);
                goto out;
--- ../linux-2.6.32.12-0.7/./fs/nfsd/vfs.c      2010-05-20
05:06:54.000000000 -0500
+++ ./fs/nfsd/vfs.c     2011-02-11 16:03:24.000000000 -0600
@@ -1717,6 +1717,8 @@ out_nfserr:
        goto out_unlock;
 }

+static atomic_t rename_retries = ATOMIC_INIT(0);
+
 /*
  * Rename a file
  * N.B. After this call _both_ ffhp and tfhp need an fh_put
@@ -1729,14 +1731,33 @@ nfsd_rename(struct svc_rqst *rqstp, stru
        struct inode    *fdir, *tdir;
        __be32          err;
        int             host_err;
+       int             seqsav;

+retry:
+       seqsav = atomic_read(&svc_export_cache.seqno);
        err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
        if (err)
                goto out;
        err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
        if (err)
                goto out;

+       /* if cache changed, the fh_export pointer comparison
+        * below is not valid, in general */
+       if (unlikely(seqsav != atomic_read(&svc_export_cache.seqno))) {
+               cache_put(&ffhp->fh_export->h, &svc_export_cache);
+               ffhp->fh_export = NULL;
+               cache_put(&tfhp->fh_export->h, &svc_export_cache);
+               tfhp->fh_export = NULL;
+               dput(ffhp->fh_dentry);
+               ffhp->fh_dentry = NULL;
+               dput(tfhp->fh_dentry);
+               tfhp->fh_dentry = NULL;
+               atomic_inc(&rename_retries);
+               goto retry;
+       }
+
        fdentry = ffhp->fh_dentry;
        fdir = fdentry->d_inode;

@@ -1785,9 +1806,6 @@ nfsd_rename(struct svc_rqst *rqstp, stru
                        goto out_dput_new;
        }

-       host_err = -EXDEV;
-       if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
-               goto out_dput_new;
        host_err = mnt_want_write(ffhp->fh_export->ex_path.mnt);
        if (host_err)
                goto out_dput_new;


On Tue, Nov 16, 2010 at 12:55 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> On Mon, Nov 15, 2010 at 03:46:44PM -0600, Joe Habermann wrote:
>> When running SVN checkouts over NFSv3, they occasionally fail with an
>> EXDEV error:
>>
>> (etc.)
>> U    snfs/build/Makefile.hpux
>> svn: In directory 'snfs/build'
>> svn: Can't move 'snfs/build/tempfile.tmp' to
>> 'snfs/build/Makefile.sol': Invalid cross-device link
>>
>> I've noticed that other people have reported this problem but it remains
>> unresolved.  Through some debugging, I believe I have found root cause.
>> nfsd_rename() contains the following code:
>>
>> 1707 /*
>> 1708  * Rename a file
>> 1709  * N.B. After this call _both_ ffhp and tfhp need an fh_put 1710  */
>> 1711 __be32
>> 1712 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char
>> *fname, int flen,
>> 1713                             struct svc_fh *tfhp, char *tname, int tlen)
>> 1714 {
>> 1715         struct dentry   *fdentry, *tdentry, *odentry, *ndentry, *trap;
>> 1716         struct inode    *fdir, *tdir;
>> 1717         __be32          err;
>> 1718         int             host_err;
>> 1719
>> 1720         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
>> 1721         if (err)
>> 1722                 goto out;
>> 1723         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
>> 1724         if (err)
>> 1725                 goto out;
>> 1726
>> 1727         fdentry = ffhp->fh_dentry;
>> 1728         fdir = fdentry->d_inode;
>> 1729
>> 1730         tdentry = tfhp->fh_dentry;
>> 1731         tdir = tdentry->d_inode;
>> 1732
>> 1733         err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
>> 1734         if (ffhp->fh_export != tfhp->fh_export)
>> 1735                 goto out;
>>
>> Line 1734 compares the fh_export pointers in the file handles for
>> the "from" and "to" directories.  However, this is a bad check since
>> it's possible these pointers differ yet refer to the same underlying
>> file system.  Based on code inspection, encached export pointers are
>> only valid for 30 minutes before they expire and are reinserted through
>> calls made between the kernel and rpc.mountd.  So a race exists where the
>> ffhp->fh_export is populated on line 1720, the export cache is updated,
>> and tfhp->fh_export is set on line 1723 to a different value.
>>
>> This window can be cranked wide open by putting an "msleep(10);"
>> between lines 1722 and 1723 above, recompiling and reinstalling nfsd.ko
>> etc., and then running SVN checkouts in a loop from an NFS client.
>> Within 30 minutes, the problem is seen.
>
> Good detective work, thanks.  Have you confirmed that applying your
> change below eliminates the problem?
>
>> I'm still very new to the NFS code, but I'm wonder whether the check
>> needs to be something like this instead:
>>
>> --- vfs.c.orig  2010-11-15 14:35:18.000000000 -0600
>> +++ vfs.c       2010-11-15 14:43:52.000000000 -0600
>> @@ -1731,7 +1731,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>>    tdir = tdentry->d_inode;
>>
>>    err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
>> -       if (ffhp->fh_export != tfhp->fh_export)
>> +       if (fdir->i_sb != tdir->i_sb)
>>        goto out;
>>
>>    err = nfserr_perm;
>>
>> Thoughts?  I think it's either that or some locking needs to be put in
>> to guarantee that the file handles for the "to" and "from" directory
>> always wind up with the same export pointer when they belong to the
>> same file system.  I guess the other question is why there is an
>> EXDEV check at the top of nfsd_rename() instead of putting it in
>> vfs_rename().
>
> The fh_verify call is also responsible for setting the identity of the
> user, and that may be affected by export options like root_squash.  In
> the above case if ffhp was on a root_squash export, and tfhp on a
> no_root_squash export, and we get a request from uid 0, we'd end up
> with an incoming uid-0 user running as root (since the fh_verify for
> tfhp was performed last), and might allow renaming from a directory that
> only root had write permission to.
>
> It's a minor hole (especially since it's already easy for a client to
> circumvent protections that vary from export to export in the default
> no_subtree_check case).  But we should be able to do better.
>
> I'm not sure what to suggest.
>
> --b.
>
>>
>> For full disclosure, I'm actually running the SLES11 SP1 kernel,
>> linux-2.6.32.12-0.7, but I've looked at the latest upstream kernels and
>> they appear to have the same code and I assume the same issue.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux