On 6/21/22 12:04 PM, Olga Kornievskaia wrote: > Hi Dennis, > > Can I ask some basic questions? Have you tried to get any kinds of > profiling done to see where the client is spending time (using perf > perhaps)? > > real 4m11.835s > user 0m0.001s > sys 0m0.277s > > sounds like 4ms are spent sleeping somewhere? Did it take 4mins to do > a network transfer (if we had a network trace we could see how long > network transfer were)? Do you have one (that goes along with > something that can tell us approximately when the request began from > the cp's perspective, like a date before hand)? > > I see that there were no rdma changes that went into 5.18 kernel so > whatever changed either a generic nfs behaviour or perhaps something > in the rdma core code (is an mellonax card being used here?) > > I wonder if the slowdown only happens on rdma or is it visible on the > tcp mount as well, have you tried? > Hi Olga, I have opened a Kernel Bugzilla if you would rather log future responses there: https://bugzilla.kernel.org/show_bug.cgi?id=216160 To answer your above questions: This is on Omni-Path hardware. I have not tried the TCP mount, I can though. I don't have any network trace per-se or a profile. We don't support like a TCP dump or anything like that. However I can tell you there is nothing going over the network while it appears to be hung. I can monitor the packet counters. If you have some ideas where I could put some trace points that could tell us something I can certainly add those. -Denny > > > On Mon, Jun 20, 2022 at 1:06 PM Dennis Dalessandro > <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> On 6/20/22 10:40 AM, Chuck Lever III wrote: >>> Hi Thorsten- >>> >>>> On Jun 20, 2022, at 10:29 AM, Thorsten Leemhuis <regressions@xxxxxxxxxxxxx> wrote: >>>> >>>> On 20.06.22 16:11, Chuck Lever III wrote: >>>>> >>>>> >>>>>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@xxxxxxxxxxxxx> wrote: >>>>>> >>>>>> Dennis, Chuck, I have below issue on the list of tracked regressions. >>>>>> What's the status? Has any progress been made? Or is this not really a >>>>>> regression and can be ignored? >>>>>> >>>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >>>>>> >>>>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of >>>>>> reports and sometimes miss something important when writing mails like >>>>>> this. If that's the case here, don't hesitate to tell me in a public >>>>>> reply, it's in everyone's interest to set the public record straight. >>>>>> >>>>>> #regzbot poke >>>>>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890 >>>>> >>>>> The above link points to an Apple trackpad bug. >>>> >>>> Yeah, I know, sorry, should have mentioned: either I or my bot did >>>> something stupid and associated that report with this regression, that's >>>> why I deassociated it with the "unlink" command. >>> >>> Is there an open bugzilla for the original regression? >>> >>> >>>>> The bug described all the way at the bottom was the origin problem >>>>> report. I believe this is an NFS client issue. We are waiting for >>>>> a response from the NFS client maintainers to help Dennis track >>>>> this down. >>>> >>>> Many thx for the status update. Can anything be done to speed things up? >>>> This is taken quite a long time already -- way longer that outlined in >>>> "Prioritize work on fixing regressions" here: >>>> https://docs.kernel.org/process/handling-regressions.html >>> >>> ENOTMYMONKEYS ;-) >>> >>> I was involved to help with the ^C issue that happened while >>> Dennis was troubleshooting. It's not related to the original >>> regression, which needs to be pursued by the NFS client >>> maintainers. >>> >>> The correct people to poke are Trond, Olga (both cc'd) and >>> Anna Schumaker. >> >> Perhaps I should open a bugzilla for the regression. The Ctrl+C issue was a >> result of the test we were running taking too long. It times out after 10 >> minutes or so and kills the process. So a downstream effect of the regression. >> >> The test is still continuing to fail as of 5.19-rc2. I'll double check that it's >> the same issue and open a bugzilla. >> >> Thanks for poking at this. >> >> -Denny