Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

Larry Keegan <lk@xxxxxxxxxxxxxxx> · Tue, 6 Aug 2013 11:02:09 +0000

On Fri, 26 Jul 2013 23:21:11 +0000
Larry Keegan <lk@xxxxxxxxxxxxxxx> wrote:
> On Fri, 26 Jul 2013 10:59:37 -0400
> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> > On Thu, Jul 25, 2013 at 05:05:26PM +0000, Larry Keegan wrote:
> > > On Thu, 25 Jul 2013 10:11:43 -0400
> > > Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > > Larry Keegan <lk@xxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > > Dear Chaps,
> > > > > 
> > > > > I am experiencing some inexplicable NFS behaviour which I
> > > > > would like to run past you.
> > > > > 
> > > > > I have a linux NFS server running kernel 3.10.2 and some
> > > > > clients running the same. The server is actually a pair of
> > > > > identical machines serving up a small number of ext4
> > > > > filesystems atop drbd. They don't do much apart from serve
> > > > > home directories and deliver mail into them. These have
> > > > > worked just fine for aeons.
> > > > > 
> > > > > The problem I am seeing is that for the past month or so, on
> > > > > and off, one NFS client starts reporting stale NFS file
> > > > > handles on some part of the directory tree exported by the
> > > > > NFS server. During the outage the other parts of the same
> > > > > export remain unaffected. Then, some ten minutes to an hour
> > > > > later they're back to normal. Access to the affected
> > > > > sub-directories remains possible from the server (both
> > > > > directly and via nfs) and from other clients. There do not
> > > > > appear to be any errors on the underlying ext4 filesystems.
> > > > > 
> > > > > Each NFS client seems to get the heebie-jeebies over some
> > > > > directory or other pretty much independently. The problem
> > > > > affects all of the filesystems exported by the NFS server, but
> > > > > clearly I notice it first in home directories, and in
> > > > > particular in my dot subdirectories for things like my mail
> > > > > client and browser. I'd say something's up the spout about 20%
> > > > > of the time.
> > 
> > And the problem affects just that one directory?
> 
> Yes. It's almost always .claws-mail/tagsdb. Sometimes
> it's .claws-mail/mailmboxcache and sometimes it's (what you would
> call) .mozilla. I suspect this is because very little else is being
> actively changed.
> 
> >  Ohter files and
> > directories on the same filesystem continue to be accessible?
> 
> Spot on. Furthermore, whilst one client is returning ESTALE the others
> are able to see and modify those same files as if there were no
> problems at all.
> 
> After however long it takes the client which was getting ESTALE on
> those directories is back to normal. The client sees the latest
> version of the files if those files have been changed by another
> client in the meantime. IOW if I hadn't been there when the ESTALE
> had happened, I'd never have noticed.
> 
> However, if another client (or the server itself with its client hat
> on) starts to experience ESTALE on some directories or others, their
> errors can start and end completely independently. So, for instance I
> might have /home/larry/this/that inaccessible on one NFS client,
> /home/larry/the/other inaccessible on another NFS client, and
> and /home/mary/quite/contrary on another NFS client. Each one bobs up
> and down with no apparent timing relationship with the others.
> 
> > > > > The server and clients are using nfs4, although for a while I
> > > > > tried nfs3 without any appreciable difference. I do not have
> > > > > CONFIG_FSCACHE set.
> > > > > 
> > > > > I wonder if anyone could tell me if they have ever come across
> > > > > this before, or what debugging settings might help me diagnose
> > > > > the problem?
> > > > Were these machines running older kernels before this started
> > > > happening? What kernel did you upgrade from if so?
> > > The full story is this:
> > > 
> > > I had a pair of boxes running kernel 3.4.3 with the aforementioned
> > > drbd pacemaker malarkey and some clients running the same.
> > > 
> > > Then I upgraded the machines by moving from plain old dos
> > > partitions to gpt. This necessitated a complete reload of
> > > everything, but there were no software changes. I can be sure that
> > > nothing else was changed because I build my entire operating
> > > system in one ginormous makefile.
> > > 
> > > Rapidly afterwards I switched the motherboards for ones with more
> > > PCI slots. There were no software changes except those relating to
> > > MAC addresses.
> > > 
> > > Next I moved from 100Mbit to gigabit hubs. Then the problems
> > > started.
> > 
> > So both the "good" and "bad" behavior were seen with the same 3.4.3
> > kernel?
> 
> Yes. I'm now running 3.10.2, but yes, 3.10.1, 3.10, 3.4.4 and 3.4.3
> all exhibit the same behaviour. I was running 3.10.2 when I made the
> network captures I spoke of.
> 
> However, when I first noticed the problem with kernel 3.4.3 it
> affected several filesystems and I thought the machines needed to be
> rebooted, but since then I've been toughing it out. I don't suppose
> the character of the problem has changed at all, but my experience of
> it has, if that makes sense.
> 
> > > Anyway, to cut a long story short, this problem seemed to me to
> > > be a file server problem so I replaced network cards, swapped
> > > hubs,
> > 
> > Including reverting back to your original configuration with 100Mbit
> > hubs?
> 
> No, guilty as charged. I haven't swapped back the /original/
> hubs, and I haven't reconstructed the old hardware arrangement exactly
> (it's a little difficult because those parts are now in use
> elsewhere), but I've done what I considered to be equivalent tests.
> I'll do some more swapping and see if I can shake something out.
> 
> Thank you for your suggestions.

Dear Chaps,

I've spent the last few days doing a variety of tests and I'm convinced
now that my hardware changes have nothing to do with the problem, and
that it only occurs when I'm using NFS 4. As it stands all my boxes are
running 3.10.3, have NFS 4 enabled in kernel but all NFS mounts are
performed with -o nfsvers=3. Everything is stable.

When I claimed earlier that I still had problems despite using NFS 3,
I think that one of the computers was still using NFS 4 unbeknownst to
me. I'm sorry for spouting guff.

Part of my testing involved using bonnie++. I was more than interested
to note that with NFS 3 performance can be truly abysmal if an NFS export
has the sync option set and then a client mounts it with -o sync. This
is a typical example of my tests:

client# bonnie++ -s 8g -m async
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
async            8G 53912  85 76221  16 37415   9 42827  75 101754   5 201.6  0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  9006  47 +++++ +++ 13676  40  8410  44 +++++ +++ 14587  39
async,8G,53912,85,76221,16,37415,9,42827,75,101754,5,201.6,0,16,9006,47,+++++,+++,13676,40,8410,44,+++++,+++,14587,39

client# bonnie++ -s 8g -m sync
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
sync             8G 16288  29  3816   0  4358   1 55449  98 113439   6 344.2  1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   922   4 29133  12  1809   4   918   4  2066   5  1907   4
sync,8G,16288,29,3816,0,4358,1,55449,98,113439,6,344.2,1,16,922,4,29133,12,1809,4,918,4,2066,5,1907,4

The above tests were conducted on the same client machine, having
4x2.5GHz CPU and 4GB of RAM, and against a server with 2x2.5GHz CPU
and 4GB of RAM. I'm using gigabit networking and have 0% packet loss.
The network is otherwise practically silent.

The underlying ext4 filesystem on the server, despite being encrypted
at the block device and mounted with -o barrier=1, yielded these
figures by way of comparison:

server# bonnie++ -s 8G -m raw
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
raw              8G 66873  98 140602  17 46965   7 38474  75 102117  10 227.7 0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
raw,8G,66873,98,140602,17,46965,7,38474,75,102117,10,227.7,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

These figures seem reasonable for a single SATA HDD in concert
with dmcrypt. Whilst I expected some degradation from exporting and
mounting sync, I have to say that I'm truly flabbergasted by the
difference between the sync and async figures. I can't help but
think I am still suffering from some sort of configuration
problem. Do the numbers from the NFS client seem unreasonable?

Yours,

Larry.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html