Re: v6.2 client behaviour change (repeat access calls)?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, sorry, I did see that patch was posted not long after I reported
the excessive repeat ACCESS calls but I have not had a chance to test
it.

It certainly looks promising. I will try to get to it next week and
update my original email.

Cheers,

Daire

On Thu, 23 Mar 2023 at 11:54, Thorsten Leemhuis
<regressions@xxxxxxxxxxxxx> wrote:
>
> On 20.03.23 13:34, Thorsten Leemhuis wrote:
> > On 10.03.23 11:43, Linux regression tracking (Thorsten Leemhuis) wrote:
> >> [CCing the regression list, as it should be in the loop for regressions:
> >> https://docs.kernel.org/admin-guide/reporting-regressions.html]
> >>
> >> [also adding the author of the culprit (Trond) and the second NFS client
> >> maintainer (Anna) to the list of recipients]
> >
> > Trond, sorry to bother you, but do you still have this regression report
> > on your radar? It looks a bit like it fall through the cracks, as I
> > don't see any reply since it was posted nearly two weeks ago. Or did
> > some progress to address this happen and I just missed it?
>
> Daire, it seems the NFS developers ignore my inquiries, sorry.
>
> But well, I noticed there is a patch that references the culprit you
> found in the bisection:
>
> https://lore.kernel.org/all/20230308080327.33906-1-chengen.du@xxxxxxxxxxxxx/
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=master&id=21fd9e8700de86d1169f6336e97d7a74916ed04a
>
> I wonder if you are aware of it or maybe even tested it already. To me
> it sounds like it could fix your problem, but this is not my area of
> expertise, so I might be totally wrong.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot ignore-activity
>
> >> [TLDR: I'm adding this report to the list of tracked Linux kernel
> >> regressions; the text you find below is based on a few templates
> >> paragraphs you might have encountered already in similar form.
> >> See link in footer if these mails annoy you.]
> >>
> >> On 07.03.23 18:38, Daire Byrne wrote:
> >>> I noticed a change in behaviour in the v6.2.x client versus v6.1.12 (and below).
> >>>
> >>> We have some servers that mount Netapps from different locations many
> >>> milliseconds away, and these contain apps and libs that get added to
> >>> the LD_LIBRARY_PATH and PATH on remote login.
> >>>
> >>> I then noticed that when I ssh'd into a remote server that had these
> >>> mounts and the shell was starting, the first login was normal and I
> >>> observed an expected flurry of lookups,getattrs and access calls for a
> >>> grand total of only ~120 packets to the Netapp.
> >>>
> >>> But when I disconnect and reconnect (ssh), now I see a flood of access
> >>> calls to the netapp for a handful of repeating filehandles which look
> >>> something like:
> >>>
> >>>  2700 85.942563180 10.23.112.10 → 10.23.21.11  NFS 254 V3 ACCESS Call,
> >>> FH: 0x7f36addc, [Check: RD LU MD XT DL]
> >>>  2701 85.999838796  10.23.21.11 → 10.23.112.10 NFS 190 V3 ACCESS Reply
> >>> (Call In 2700), [Allowed: RD LU MD XT DL]
> >>>  2702 85.999970825 10.23.112.10 → 10.23.21.11  NFS 254 V3 ACCESS Call,
> >>> FH: 0x7f36addc, [Check: RD LU MD XT DL]
> >>>  2703 86.055340946  10.23.21.11 → 10.23.112.10 NFS 190 V3 ACCESS Reply
> >>> (Call In 2702), [Allowed: RD LU MD XT DL]
> >>>  2704 86.056865308 10.23.112.10 → 10.23.21.11  NFS 254 V3 ACCESS Call,
> >>> FH: 0x7f36addc, [Check: RD LU MD XT DL]
> >>>  2705 86.112233415  10.23.21.11 → 10.23.112.10 NFS 190 V3 ACCESS Reply
> >>> (Call In 2704), [Allowed: RD LU MD XT DL]
> >>>
> >>> This time we total 5000+ packets for this login which becomes very
> >>> noticeable when the Netapp is 50ms away.
> >>>
> >>> I didn't understand why the first login was fine but the second goes
> >>> into this repeating access pattern. I set actimeo=3600 (long) but it
> >>> does not seem to affect it.
> >>>
> >>> I do not see this prior to v6.2 where repeated logins are equally fast
> >>> and we don't see the repeating access calls.
> >>>
> >>> So a bit of digging through the v6.2 changes and this looked like the
> >>> relevant change:
> >>>
> >>> commit 0eb43812c027 ("NFS: Clear the file access cache upon login”)
> >>> [PATCH] NFS: Judge the file access cache's timestamp in rcu path?
> >>>
> >>> I reverted those and got the prior (v6.1) performance.
> >>>
> >>> What constitutes a login exactly? I also have services like "sysstat"
> >>> or pcp that cause a systemd-logind to trigger regularly on our
> >>> machines.... does that count and invalidate the cache?
> >>>
> >>> Do the repeated access calls on the same handful of filehandles make
> >>> sense? Even prior to those patches (or v6.1) there are only a couple
> >>> of ACCESS calls to the Netapp on login.
> >>>
> >>> We are a bit unique in that we run quite a few WAN high latency NFS
> >>> workflows so are happy to trade long lived caches (e.g. actimeo and
> >>> even nocto on occasion) for lower ops at the expense of total
> >>> correctness.
> >>
> >> Thanks for the report. To be sure the issue doesn't fall through the
> >> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> >> tracking bot:
> >>
> >> #regzbot ^introduced 0eb43812c027
> >> #regzbot title nfs: flood of access on second log-in (first is fine)
> >> #regzbot ignore-activity
> >>
> >> This isn't a regression? This issue or a fix for it are already
> >> discussed somewhere else? It was fixed already? You want to clarify when
> >> the regression started to happen? Or point out I got the title or
> >> something else totally wrong? Then just reply and tell me -- ideally
> >> while also telling regzbot about it, as explained by the page listed in
> >> the footer of this mail.
> >>
> >> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> >> to the report (the parent of this mail). See page linked in footer for
> >> details.
> >>
> >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >> --
> >> Everything you wanna know about Linux kernel regression tracking:
> >> https://linux-regtracking.leemhuis.info/about/#tldr
> >> That page also explains what to do if mails like this annoy you.




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux