Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

On 03/12/2018 11:45, Catalin Marinas wrote:
> Hi Trond,
> 
> On Sun, Dec 02, 2018 at 04:44:49PM +0000, Trond Myklebust wrote:
>> On Fri, 2018-11-30 at 14:31 -0500, Trond Myklebust wrote:
>>> On Fri, 2018-11-30 at 16:19 +0000, Cristian Marussi wrote:
>>>> On 29/11/2018 19:56, Trond Myklebust wrote:
>>>>> On Thu, 2018-11-29 at 19:28 +0000, Cristian Marussi wrote:
>>>>> Question to you both: when this happens, does /proc/*/stack show
>>>>> any of the processes hanging in the socket or sunrpc code? If
>>>>> so, can you please send me examples of those stack traces (i.e.
>>>>> the contents of /proc/<pid>/stack for the processes that are
>>>>> hanging)
>>>>
>>>> (using a reverse shell since starting ssh causes a lot of pain and
>>>> traffic)
>>>>
>>>> Looking at NFS traffic holes(30-40 secs) to detect Client side
>>>> various HANGS
>>
>> Chuck and I have identified a few issues that might have an effect on
>> the hangs you report. Could you please give the linux-next branch in my
>> repository on git.linux-nfs.org (
>> https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/linux-next
>> ) a try?
>>
>> git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
> 
> I tried, unfortunately there's no difference for me (I merged the above
> branch on top of 4.20-rc5).
> 

same for me. Issue still there.

Beside I saw some differences in the dbench result which I used for testing.

>From the dbench (comparing with previous mail) it seems that
Unlink and Qpathinfo MaxLat has normalized.

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX      90820    13.613 13855.620
 Close          66565    18.075 13853.289
 Rename          3845    23.668   326.642
 Unlink         18450     4.581   186.062
 Qpathinfo      82068     2.677   280.203
 Qfileinfo      14235    10.357   176.373
 Qfsinfo        15156     2.822   242.794
 Sfileinfo       7400    17.018   240.546
 Find           31812     5.988   277.332
 WriteX         44735     0.155    14.685
 ReadX         141872     0.741 13817.870
 LockX            288    10.558    96.179
 UnlockX          288     3.307    57.939
 Flush           6389    20.427   187.429


> Is there anything else blocked in the RPC layer? The above are all
> standard tasks waiting for the rpciod/xprtiod workqueues to complete
> the calls to the server.
cat  /proc/692/stack
[<0>] __switch_to+0x6c/0x90
[<0>] rescuer_thread+0x2e8/0x360
[<0>] kthread+0x134/0x138
[<0>] ret_from_fork+0x10/0x1c
[<0>] 0xffffffffffffffff

I was now trying to collect more evidence ftracing during the quiet-stuck-period
till the restart happens.

Thanks

Cristian



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux