Re: [PATCH V2] make "noac" and "actimeo=0" work correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Chuck Lever wrote:
On Thu, Jul 10, 2008 at 1:41 PM, Peter Staubach <staubach@xxxxxxxxxx> wrote:
Chuck Lever wrote:
Hi Peter-


Hi, Chuck.

On Tue, Jul 8, 2008 at 12:08 PM, Peter Staubach <staubach@xxxxxxxxxx>
wrote:

Hi.

I've been looking at a bugzilla which describes a problem where
a customer was advised to use either the "noac" or "actimeo=0"
mount options to solve a consistency problem that they were
seeing in the file attributes.  It turned out that this solution
did not work reliably for them because sometimes, the local
attribute cache was believed to be valid and not timed out.
(With an attribute cache timeout of 0, the cache should always
appear to be timed out.)

In looking at this situation, it appears to me that the problem
is that the attribute cache timeout code has an off-by-one
error in it.  It is assuming that the cache is valid in the
region, [read_cache_jiffies, read_cache_jiffies + attrtimeo].  The
cache should be considered valid only in the region,
[read_cache_jiffies, read_cache_jiffies + attrtimeo).  With this
change, the options, "noac" and "actimeo=0", work as originally
expected.

While I was there, I addressed a problem with the jiffies
overflowing on 32 bit systems.  When overflow occurs, the
value of read_cache_jiffies + attrtimeo can be less then the
value of read_cache_jiffies.  This would cause an unnecessary
GETATTR over the wire.

Thoughts and/or comments?  This is an updated patch which includes
the previous support which was added to correct the noac/actimeo=0
handling.

A couple of random thoughts below.


Some thoughts in response --

Signed-off-by: Peter Staubach <staubach@xxxxxxxxxx>


--- linux-2.6.25.i686/fs/nfs/dir.c.org
+++ linux-2.6.25.i686/fs/nfs/dir.c
@@ -1808,7 +1808,7 @@ static int nfs_access_get_cached(struct
      cache = nfs_access_search_rbtree(inode, cred);
      if (cache == NULL)
              goto out;
-       if (!time_in_range(jiffies, cache->jiffies, cache->jiffies +
nfsi->attrtimeo))
+       if (!nfs_time_in_range_open(jiffies, cache->jiffies,
cache->jiffies
+ nfsi->attrtimeo))
              goto out_stale;
      res->jiffies = cache->jiffies;
      res->cred = cache->cred;
--- linux-2.6.25.i686/fs/nfs/inode.c.org
+++ linux-2.6.25.i686/fs/nfs/inode.c
@@ -706,14 +706,7 @@ int nfs_attribute_timeout(struct inode *

      if (nfs_have_delegation(inode, FMODE_READ))
              return 0;
-       /*
-        * Special case: if the attribute timeout is set to 0, then
always
-        *               treat the cache as having expired (unless
holding
-        *               a delegation).
-        */
-       if (nfsi->attrtimeo == 0)
-               return 1;
-       return !time_in_range(jiffies, nfsi->read_cache_jiffies,
nfsi->read_cache_jiffies + nfsi->attrtimeo);
+       return !nfs_time_in_range_open(jiffies, nfsi->read_cache_jiffies,
nfsi->read_cache_jiffies + nfsi->attrtimeo);
 }

 /**
@@ -1098,7 +1091,7 @@ static int nfs_update_inode(struct inode
              nfsi->attrtimeo_timestamp = now;
              nfsi->last_updated = now;
      } else {
-               if (!time_in_range(now, nfsi->attrtimeo_timestamp,
nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
+               if (!nfs_time_in_range_open(now,
nfsi->attrtimeo_timestamp,
nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
                      if ((nfsi->attrtimeo <<= 1) >
NFS_MAXATTRTIMEO(inode))
                              nfsi->attrtimeo = NFS_MAXATTRTIMEO(inode);
                      nfsi->attrtimeo_timestamp = now;
--- linux-2.6.25.i686/include/linux/nfs_fs.h.org
+++ linux-2.6.25.i686/include/linux/nfs_fs.h
@@ -121,7 +121,7 @@ struct nfs_inode {
       *
       * We need to revalidate the cached attrs for this inode if
       *
-        *      jiffies - read_cache_jiffies > attrtimeo
+        *      jiffies - read_cache_jiffies >= attrtimeo
       */
      unsigned long           read_cache_jiffies;
      unsigned long           attrtimeo;
@@ -244,6 +244,22 @@ static inline unsigned NFS_MAXATTRTIMEO(
      return S_ISDIR(inode->i_mode) ? nfss->acdirmax : nfss->acregmax;
 }

+static inline int nfs_time_in_range_open(unsigned long a,
+                               unsigned long b, unsigned long c)

All of nfs_time_in_range_open's callers use a sum of 'b' and
'nfsi->attrtimeo' for 'c'.  Would it be cleaner to pass in
nfsi->attrtimeo' rather than 'b + nfsi->attrtimeo' and do the sum
here?  It might make sense to explicitly check nfsi->attrtimeo for
zero here to document the special behavior of "actimeo=0".


The behavior of "actimeo=0" isn't any more special than "actimeo=1".
It simply indicates that the attribute timeout is 0 jiffies long.

Right.  I'm simply suggesting that adding explicit code is good
documentation for this case.  It calls it out so developers remember
that to check that case when they change this code.

You are correct that "noac/actimeo=0" is not the common case; however,
it is a case that gets ignored and therefore broken easily, and that
usually results in corruption of a customer's data.

I thought about reducing the arguments, but it didn't seem to yield
anything any clearer to me.

Alternately, checking explicitly if b and c are equal might accomplish
the same without changing the synopsis.

Also, all of nfs_time_in_range_open's callers negate the return value.
 Would the code in the callers be cleaner if that negation was moved
into nfs_time_in_range_open?  You might rename
nfs_time_in_range_open() as nfs_cache_has_expired(), for example, to
make the 'if' statements in the callers make sense in English.

My feeling is that if you have to sit and stare at this for 5 minutes
to understand precisely how it works, it has already become too
obfuscated.  In addition to fixing any bugs, I wonder if we can make
it easier to understand and maintain as well.


+{
+       /*
+        * If c is less then b, then the jiffies have wrapped.
+        * If so, then check to see if a is between b and the
+        * max jiffies value or between 0 and the value of c.
+        * This is the range between b and c.

include/linux/jiffies.h claims it handles jiffy wrapping correctly.
Why isn't time_in_range() sufficient if 'c' has wrapped?  If it isn't,
should you fix time_in_range() too?


Clearly, time_in_range() is not sufficient if the 'c' has
wrapped.  It only tests to see if a >=b and a <= c.  If 'c'
is less than 'b', then time_in_range() will return false.

I am reluctant to fix time_in_range() because I don't know
that it is broken.  It appears to me that it works for other
uses, because otherwise, someone would have "fixed" it.

The only callers I found are the NFS client and the RPC client's auth
cache, so it is probably safe to change time_in_range() without
concern for breaking someone else's code.  It's all ours, baby :-)

<fleite@xxxxxxxxxx> introduced time_in_range() a year ago with commit
c7e15961 for, it appears from his patch description, very similar
reasons to your fix.  It might be a good idea to discuss the wrapping
bug with him.

You could then simplify this to "return b != c && time_in_range(a, b,
c);" or something like that.  Or if you negate the return value here:

static inline nfs_attributes_have_expired(unsigned long current,
                                                        unsigned long
start, unsigned long end)
{
       return (start == end) || !time_in_range(current, start, end);
}

My 0.02USD.


The change, which makes attrtimeo=0 work for free, is to figure out
that if the attrtimeo is N, then the attribute cache is valid from
time, T, to T + N - 1, not T + N.  Thus, the current attribute
cache implementation is off by one because the attribute cache
should expire at time, T + N.  The time_in_range() macro was handy
and looked right, but wasn't quite right for the desired semantics.

Adding tests to check to see if b and c are equal is tuning for
the wrong case, I think.  I believe that the majority of file
systems are not mounted with "noac" or "actimeo=0", so the extra
test would just be overhead for the common case.

True enough, but you can "fix" that simply by reversing the two checks:

   return !time_in_range(a, b, c) || unlikely(b == c);

Again, I think there is some value in explicitly documenting the
actimeo=0 case here whether or not it is covered implicitly by
time_in_range(), precisely because it is not the common case and is
often forgotten when changing attribute cache-related logic.  This is
exactly why we are now here fixing this problem!

The comments you added here nicely explain the complexity of the time
checks, but do not explicitly state that actimeo=0 must work after any
changes to this code -- one of the important reasons that you have
open-coded the time comparisons rather than reusing time_in_range().

For me this is one of those times where cleverly folding all the cases
into a single group of logic makes the code less good because it
increases the chances of breakage later on, for example if
time_in_range() is changed by someone else who doesn't have local
knowledge of NFS.


This was really just an off by one bug.  _All_ attribute cache
timeouts are one clock tick too long.

Adding unlikely() around the test may help to reduce its cost,
but I don't think that it will make it zero cost.  Ordering the
tests will also help to minimize the cost, but still won't make
the additional test zero cost.

Actually, the _only_ reason that I implemented nfs_time_in_range_open
instead of just modifying time_in_range() was that I didn't want
to impact things that were orthogonal to the bug that I needed to
fix.  Given that time_in_range() is only used by NFS and RPC,
perhaps we can safely modify it.  If time_in_range had only been
being used by those three tests, I would have simply updated it.

Simply correcting the math gets us the desired functionality for
zero additional cost over the broken support.  In my viewpoint,
it is also the easiest to understand because there won't be any
special cases to worry about and the math will match the desired
semantics.

   Thanx...

      ps

+        *
+        * Otherwise, just check to see whether a is in [b, c).
+        */
+       if (c < b)
+               return time_after_eq(a, b) || time_before(a, c);
+       return time_after_eq(a, b) && time_before(a, c);
+}
+
 static inline int NFS_STALE(const struct inode *inode)
 {
      return test_bit(NFS_INO_STALE, &NFS_I(inode)->flags);





--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux