Re: NFS4 BAD_STATEID loop (kernel 3.0)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Oct 24, 2011, at 6:40 AM, David Flynn wrote:

> Dear All,
> 
> On a system running kernel 3.0, mounting a Solaris NFS4 export, we
> observe a continuous 20Mbit/sec exchange between client and server that had
> been occurring for 10 days.

Can you tell us a little more about the server?  Which release of Solaris?  What hardware?

> From /proc/mounts:
> home:/home/ /home nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.29.190.21,minorversion=0,local_lock=none,addr=172.29.120.140 0 0
> 
> Reconciling logs, we find that around the time the exchange started,
> the kernel reports a process having blocked:
> 
> [787321.680029] INFO: task bash:17799 blocked for more than 120 seconds.
> [787321.680067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [787321.680104] bash            D 0000000000000002     0 17799      1 0x00000000
> [787321.680131]  ffff880173ad9ca8 0000000000000086 ffff88017216b008 0000000000012a40
> [787321.680171]  ffff880173ad9fd8 0000000000012a40 ffff880173ad8000 0000000000012a40
> [787321.680211]  0000000000012a40 0000000000012a40 ffff880173ad9fd8 0000000000012a40
> [787321.680251] Call Trace:
> [787321.680275]  [<ffffffff8110e900>] ? __lock_page+0x70/0x70
> [787321.680299]  [<ffffffff815fea2c>] io_schedule+0x8c/0xd0
> [787321.680321]  [<ffffffff8110e90e>] sleep_on_page+0xe/0x20
> [787321.680344]  [<ffffffff815ff2af>] __wait_on_bit+0x5f/0x90
> [787321.680366]  [<ffffffff8110ead3>] wait_on_page_bit+0x73/0x80
> [787321.680390]  [<ffffffff81085980>] ? autoremove_wake_function+0x40/0x40
> [787321.680416]  [<ffffffff8111aeb5>] ? pagevec_lookup_tag+0x25/0x40
> [787321.680439]  [<ffffffff8110ed06>] filemap_fdatawait_range+0xf6/0x1a0
> [787321.680483]  [<ffffffffa022e740>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
> [787321.680507]  [<ffffffff8111a3b1>] ? do_writepages+0x21/0x40
> [787321.680530]  [<ffffffff8110ff8b>] ? __filemap_fdatawrite_range+0x5b/0x60
> [787321.680555]  [<ffffffff81110000>] filemap_write_and_wait_range+0x70/0x80
> [787321.680580]  [<ffffffff8119b45a>] vfs_fsync_range+0x5a/0x90
> [787321.680602]  [<ffffffff8119b4fc>] vfs_fsync+0x1c/0x20
> [787321.680628]  [<ffffffffa0222be4>] nfs_file_flush+0x54/0x80 [nfs]
> [787321.680653]  [<ffffffff8116d66f>] filp_close+0x3f/0x90
> [787321.680675]  [<ffffffff8116e097>] sys_close+0xb7/0x120
> [787321.680698]  [<ffffffff816090c2>] system_call_fastpath+0x16/0x1b
> 
> A network trace showed the following exchange below being continually
> repeated.  Unfortunately this system too has since been rebooted.
> 
> Any thoughts on the matter would be most welcome,
> 
> Kind regards,
> 
> ..david
> 
> 
> No.     Time            Source                Destination           Protocol Size  Info
>   9880 11:40:12.833617 172.29.190.21         172.29.120.140        NFS      1122  V4 COMPOUND Call (Reply In 9881) <EMPTY> PUTFH;WRITE;GETATTR
> 
> Frame 9880: 1122 bytes on wire (8976 bits), 1122 bytes captured (8976 bits)
>    Arrival Time: Oct 17, 2011 11:40:12.833617000 BST
>    Frame Length: 1122 bytes (8976 bits)
>    Capture Length: 1122 bytes (8976 bits)
> Ethernet II, Src: ChelsioC_06:68:f9 (00:07:43:06:68:f9), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be)
> Internet Protocol, Src: 172.29.190.21 (172.29.190.21), Dst: 172.29.120.140 (172.29.120.140)
> Transmission Control Protocol, Src Port: 816 (816), Dst Port: nfs (2049), Seq: 5199745, Ack: 275801, Len: 1056
> Remote Procedure Call, Type:Call XID:0x5daa6e93
> Network File System
>    [Program Version: 4]
>    [V4 Procedure: COMPOUND (1)]
>    Tag: <EMPTY>
>        length: 0
>        contents: <EMPTY>
>    minorversion: 0
>    Operations (count: 3)
>        Opcode: PUTFH (22)
>            filehandle
>                length: 36
>                [hash (CRC-32): 0x6e4b15f3]
>                decode type as: unknown
>                filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300...
>        Opcode: WRITE (38)
>            stateid
>                seqid: 0x00000000
>                Data: 4e06f15b800f82e300000000
>            offset: 11392
>            stable: FILE_SYNC4 (2)
>            Write length: 814
>            Data: <DATA>
>                length: 814
>                contents: <DATA>
>                fill bytes: opaque data
>        Opcode: GETATTR (9)
>            GETATTR4args
>                attr_request
>                    bitmap[0] = 0x00000018
>                        [2 attributes requested]
>                        mand_attr: FATTR4_CHANGE (3)
>                        mand_attr: FATTR4_SIZE (4)
>                    bitmap[1] = 0x00300000
>                        [2 attributes requested]
>                        recc_attr: FATTR4_TIME_METADATA (52)
>                        recc_attr: FATTR4_TIME_MODIFY (53)
> 
> No.     Time            Source                Destination           Protocol Size  Info
>   9881 11:40:12.833956 172.29.120.140        172.29.190.21         NFS      122   V4 COMPOUND Reply (Call In 9880) <EMPTY> PUTFH;WRITE
> 
> Frame 9881: 122 bytes on wire (976 bits), 122 bytes captured (976 bits)
>    Arrival Time: Oct 17, 2011 11:40:12.833956000 BST
>    [Time delta from previous captured frame: 0.000339000 seconds]
>    Frame Length: 122 bytes (976 bits)
>    Capture Length: 122 bytes (976 bits)
> Ethernet II, Src: Cisco_1e:f7:80 (00:13:5f:1e:f7:80), Dst: ChelsioC_06:68:f9 (00:07:43:06:68:f9)
> Internet Protocol, Src: 172.29.120.140 (172.29.120.140), Dst: 172.29.190.21 (172.29.190.21)
> Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 816 (816), Seq: 275801, Ack: 5200801, Len: 56
> Remote Procedure Call, Type:Reply XID:0x5daa6e93
> Network File System
>    [Program Version: 4]
>    [V4 Procedure: COMPOUND (1)]
>    Status: NFS4ERR_BAD_STATEID (10025)
>    Tag: <EMPTY>
>        length: 0
>        contents: <EMPTY>
>    Operations (count: 2)
>        Opcode: PUTFH (22)
>            Status: NFS4_OK (0)
>        Opcode: WRITE (38)
>            Status: NFS4ERR_BAD_STATEID (10025)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux