Re: nfs problem [solved]

Eyal Lebedinsky <fedora@xxxxxxxxxxxxxx> · Sat, 3 Jun 2017 09:51:39 +1000

On 03/06/17 03:13, Rick Stevens wrote:
On 06/02/2017 05:07 AM, Eyal Lebedinsky wrote:
On 02/06/17 21:48, Roger Heflin wrote:
If the machine mounting the file and doing the tail has read from the
file and there is new data added in that last block and because of the
rate the data is coming into the file the timestamp on the file does
not change then the client nfs host will not know that the last block
has changed and will not know to reread it (it is already in cache).
If it is this bug/feature nfs has worked this way I think pretty much
forever at a larger scale (2 hosts each writing every other block, if
the timestamp does not change then each node will see the others
blocks as empty because of cache, at least until the timestamp changes
from what it knows it wrote).  The trick my previous job implemented
was to make sure the timestamp on the file moved ahead at least one
second so that the clients knew the file changed.  but if tail is
actively reading it while things are getting written into it I don't
see a way it would be able to work that well.

What you are describing sounds like a variant of this issue.

Thanks Roger,

Interesting, though I wonder why it worked very well until the latest
kernel
series (3.10/3.11) which started showing the problem. Looks like a new
"feature"
to me.

BTW, the server is also the time server and the two are well
synchronised. When
a zero block shows up it can take a minute or two before the real data
shows up.
I use 'less' to view the file, hit refresh (Shift-G) and soon a line of
zeroes
comes along. I kept refreshing for a few minutes until the good data shows.

When I originally notices the problem (a monitoring script started
showing garbage),
the monitored file was updated once a minute and it needed to be updated
two or
three times before the real data was exported.

which I consider rather a long time for a file to present wrong content
(over nfs).

Maybe there is an export (or mount) option I can use?

Also, I could not find a reference to this problem when I investigated
the issue
initially, and as such I assumed it is my setup. But the server (f19)
had no
updates or changes for a long while. It is clearly the new kernels
exposing this,
and I tested more that one client machine to verify that they also show
the issue.

Newer kernels use NFSv4 by default. I can't remember what F19 uses
natively or if it has issues with NFSv4 clients (it may not really
implement NFSv4 properly or improperly negotiates protocol changes).
You might try forcing NFSv3 mounts and see if that clears the problem.

Hi Rick,

Good call. I set mount option 'nfsvers=3' and the problem went away.
kernel 3.9 probably did not implement v4 that well.

To be sure, I mounted one fs as nfsvers=3 and another as the default
(mount says 4.1) and the problem does not show on the first but does
show on the second.

Thanks
	Eyal
You may want to look at the "noac" option on the clients as well as the
"acregmin", "acregmax", "acdirmin", "acdirmax" and "actimeo" values
(see "man 5 nfs"). Defaults and such have changed with different
kernels and perhaps there's some incompatibility.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ricks@xxxxxxxxxxxxxx -
- AIM/Skype: therps2        ICQ: 226437340           Yahoo: origrps2 -
-                                                                    -
-      Cuteness can be overcome through sufficient bastardry         -
-                                         --Mark 'Kamikaze' Hughes   -
----------------------------------------------------------------------
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx

--
Eyal Lebedinsky (fedora@xxxxxxxxxxxxxx)
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx