Re: NFS regression between 5.17 and 5.18

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dennis,

Can I ask some basic questions? Have you tried to get any kinds of
profiling done to see where the client is spending time (using perf
perhaps)?

real    4m11.835s
user    0m0.001s
sys     0m0.277s

sounds like 4ms are spent sleeping somewhere? Did it take 4mins to do
a network transfer (if we had a network trace we could see how long
network transfer were)? Do you have one (that goes along with
something that can tell us approximately when the request began from
the cp's perspective, like a date before hand)?

I see that there were no rdma changes that went into 5.18 kernel so
whatever changed either a generic nfs behaviour or perhaps something
in the rdma core code (is an mellonax card being used here?)

I wonder if the slowdown only happens on rdma or is it visible on the
tcp mount as well, have you tried?



On Mon, Jun 20, 2022 at 1:06 PM Dennis Dalessandro
<dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On 6/20/22 10:40 AM, Chuck Lever III wrote:
> > Hi Thorsten-
> >
> >> On Jun 20, 2022, at 10:29 AM, Thorsten Leemhuis <regressions@xxxxxxxxxxxxx> wrote:
> >>
> >> On 20.06.22 16:11, Chuck Lever III wrote:
> >>>
> >>>
> >>>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@xxxxxxxxxxxxx> wrote:
> >>>>
> >>>> Dennis, Chuck, I have below issue on the list of tracked regressions.
> >>>> What's the status? Has any progress been made? Or is this not really a
> >>>> regression and can be ignored?
> >>>>
> >>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >>>>
> >>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> >>>> reports and sometimes miss something important when writing mails like
> >>>> this. If that's the case here, don't hesitate to tell me in a public
> >>>> reply, it's in everyone's interest to set the public record straight.
> >>>>
> >>>> #regzbot poke
> >>>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890
> >>>
> >>> The above link points to an Apple trackpad bug.
> >>
> >> Yeah, I know, sorry, should have mentioned: either I or my bot did
> >> something stupid and associated that report with this regression, that's
> >> why I deassociated it with the "unlink" command.
> >
> > Is there an open bugzilla for the original regression?
> >
> >
> >>> The bug described all the way at the bottom was the origin problem
> >>> report. I believe this is an NFS client issue. We are waiting for
> >>> a response from the NFS client maintainers to help Dennis track
> >>> this down.
> >>
> >> Many thx for the status update. Can anything be done to speed things up?
> >> This is taken quite a long time already -- way longer that outlined in
> >> "Prioritize work on fixing regressions" here:
> >> https://docs.kernel.org/process/handling-regressions.html
> >
> > ENOTMYMONKEYS ;-)
> >
> > I was involved to help with the ^C issue that happened while
> > Dennis was troubleshooting. It's not related to the original
> > regression, which needs to be pursued by the NFS client
> > maintainers.
> >
> > The correct people to poke are Trond, Olga (both cc'd) and
> > Anna Schumaker.
>
> Perhaps I should open a bugzilla for the regression. The Ctrl+C issue was a
> result of the test we were running taking too long. It times out after 10
> minutes or so and kills the process. So a downstream effect of the regression.
>
> The test is still continuing to fail as of 5.19-rc2. I'll double check that it's
> the same issue and open a bugzilla.
>
> Thanks for poking at this.
>
> -Denny



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux