Re: NFS3 subsystem hung, Kernel alive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 20, 2018 at 10:52:17AM +0000, Jäkel, Guido wrote:
> Hi all,
> 
> Today at about "the event time" production keeps running but I discover that one of the hosts in the Test stage (bladerunner10) become very "stuttering" to react on commands.
> 
> From  https://utcc.utoronto.ca/~cks/space/blog/linux/NFSMountstatsXprt  I got some information about. And I started to
> 
> 	watch -n 1 "sed -n '/^device .* on \/ with/,/^$/ p'  /proc/self/mountstats"
> 
> on the hosts to watch the root mount. On  bladerunner10  I notice a very high value of the 8th field of xprt ('bad XIDs'), which is identical to the difference between filed 6 and 7 (TX-RX). Does that mean, that there were a high number of bad answers to questions? Or is this the number of replies that are out of time? 

I don't know what you mean by "filed 6 and 7".  Oh, wait, I guess you're
talking about the 6th and 7th fileds of the "xprt" line in mountstats.

bad_xids means the client got a response but couldn't find a matching
reply.  I'm not sure why that would happen--maybe a response came after
the client gave up waiting for it?

--b.

> 
> If I watch TX-RX-BAD, this is near zero on all hosts. But on bladerunner10, it sometime rises to enormous values (>100000) and in this moment, all File-IO is frozen - E.g. I don't get a new prompt if I simply hit enter on an bash command line.
> 
> 
> 
> device 10.69.63.196:/02/q/diskless/roots/bladerunner10 mounted on / with fstype nfs statvers=1.1
>         opts:   rw,vers=3,rsize=1024,wsize=1024,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.69.63.196,mountvers=3,mountport=0,mountproto=tcp,local_lock=all
>         age:    9939702
>         caps:   caps=0x3fc7,wtmult=512,dtsize=1024,bsize=0,namlen=255
>         sec:    flavor=1,pseudoflavor=1
>         events: 269343924 134739087308 20734 140915 232195524 79262 134886538148 21804722 104 16067 0 293341786 222190 75356 177067969 35796 2826 231908027 0 411 21783902 199 0 0 0 0 0 
>         bytes:  128654830696 20320953759 0 0 219517679 20415228955 63772 5008821 
>         RPC iostats version: 1.0  p/v: 100003/3 (nfs)
>         xprt:   tcp 837 1 1 0 0 21448220350 21448165066 55284 576287654630121 0 34712 845220323041 514256914035
>         per-op statistics
>                 NULL: 0 0 0 0 0 0 0 0
>              GETATTR: 269343899 269343899 0 36809071916 30166513552 3034498 71578350 78080492
>              SETATTR: 75721 75721 0 15972628 10903824 1855 70284 73720
>               LOOKUP: 80296 80296 0 15825484 18814360 7312 135951 144678
>               ACCESS: 39274 39274 0 7048052 4712880 4241 26485 31274
>             READLINK: 995 995 0 170796 139564 72 479 567
>                 READ: 223945 223945 0 40327228 248198116 130225 1437810 1583172
>                WRITE: 19958985 19958985 0 24406783848 3193437600 167421458404 27086586679 194511012992
>               CREATE: 5281 5281 0 1126060 1542052 132 21698 21989
>                MKDIR: 127 127 0 29160 36740 10 12307 12321
>              SYMLINK: 3 3 0 716 876 0 1 1
>                MKNOD: 3 3 0 636 876 0 2 2
>               REMOVE: 3400 3400 0 663604 489600 52 12164 12312
>                RMDIR: 122 122 0 24624 17520 15 463 483
>               RENAME: 2074 2074 0 491352 539240 67 11433 11529
>                 LINK: 0 0 0 0 0 0 0 0
>              READDIR: 31882 31882 0 6376400 32311036 2707 64806 68379
>          READDIRPLUS: 273882 273882 0 55807876 140884360 14257 509826 530894
>               FSSTAT: 538 538 0 95212 90384 61 445 519
>               FSINFO: 2 2 0 272 328 0 0 0
>             PATHCONF: 1 1 0 136 140 0 0 0
>               COMMIT: 0 0 0 0 0 0 0 0
> 
> 



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux