v6.2 client behaviour change (repeat access calls)?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I noticed a change in behaviour in the v6.2.x client versus v6.1.12 (and below).

We have some servers that mount Netapps from different locations many
milliseconds away, and these contain apps and libs that get added to
the LD_LIBRARY_PATH and PATH on remote login.

I then noticed that when I ssh'd into a remote server that had these
mounts and the shell was starting, the first login was normal and I
observed an expected flurry of lookups,getattrs and access calls for a
grand total of only ~120 packets to the Netapp.

But when I disconnect and reconnect (ssh), now I see a flood of access
calls to the netapp for a handful of repeating filehandles which look
something like:

 2700 85.942563180 10.23.112.10 → 10.23.21.11  NFS 254 V3 ACCESS Call,
FH: 0x7f36addc, [Check: RD LU MD XT DL]
 2701 85.999838796  10.23.21.11 → 10.23.112.10 NFS 190 V3 ACCESS Reply
(Call In 2700), [Allowed: RD LU MD XT DL]
 2702 85.999970825 10.23.112.10 → 10.23.21.11  NFS 254 V3 ACCESS Call,
FH: 0x7f36addc, [Check: RD LU MD XT DL]
 2703 86.055340946  10.23.21.11 → 10.23.112.10 NFS 190 V3 ACCESS Reply
(Call In 2702), [Allowed: RD LU MD XT DL]
 2704 86.056865308 10.23.112.10 → 10.23.21.11  NFS 254 V3 ACCESS Call,
FH: 0x7f36addc, [Check: RD LU MD XT DL]
 2705 86.112233415  10.23.21.11 → 10.23.112.10 NFS 190 V3 ACCESS Reply
(Call In 2704), [Allowed: RD LU MD XT DL]

This time we total 5000+ packets for this login which becomes very
noticeable when the Netapp is 50ms away.

I didn't understand why the first login was fine but the second goes
into this repeating access pattern. I set actimeo=3600 (long) but it
does not seem to affect it.

I do not see this prior to v6.2 where repeated logins are equally fast
and we don't see the repeating access calls.

So a bit of digging through the v6.2 changes and this looked like the
relevant change:

commit 0eb43812c027 ("NFS: Clear the file access cache upon login”)
[PATCH] NFS: Judge the file access cache's timestamp in rcu path?

I reverted those and got the prior (v6.1) performance.

What constitutes a login exactly? I also have services like "sysstat"
or pcp that cause a systemd-logind to trigger regularly on our
machines.... does that count and invalidate the cache?

Do the repeated access calls on the same handful of filehandles make
sense? Even prior to those patches (or v6.1) there are only a couple
of ACCESS calls to the Netapp on login.

We are a bit unique in that we run quite a few WAN high latency NFS
workflows so are happy to trade long lived caches (e.g. actimeo and
even nocto on occasion) for lower ops at the expense of total
correctness.

Cheers,

Daire




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux