Dear All, On a system running kernel 3.0, mounting a Solaris NFS4 export, we observe a continuous 20Mbit/sec exchange between client and server that had been occurring for 10 days. >From /proc/mounts: home:/home/ /home nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.29.190.21,minorversion=0,local_lock=none,addr=172.29.120.140 0 0 Reconciling logs, we find that around the time the exchange started, the kernel reports a process having blocked: [787321.680029] INFO: task bash:17799 blocked for more than 120 seconds. [787321.680067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [787321.680104] bash D 0000000000000002 0 17799 1 0x00000000 [787321.680131] ffff880173ad9ca8 0000000000000086 ffff88017216b008 0000000000012a40 [787321.680171] ffff880173ad9fd8 0000000000012a40 ffff880173ad8000 0000000000012a40 [787321.680211] 0000000000012a40 0000000000012a40 ffff880173ad9fd8 0000000000012a40 [787321.680251] Call Trace: [787321.680275] [<ffffffff8110e900>] ? __lock_page+0x70/0x70 [787321.680299] [<ffffffff815fea2c>] io_schedule+0x8c/0xd0 [787321.680321] [<ffffffff8110e90e>] sleep_on_page+0xe/0x20 [787321.680344] [<ffffffff815ff2af>] __wait_on_bit+0x5f/0x90 [787321.680366] [<ffffffff8110ead3>] wait_on_page_bit+0x73/0x80 [787321.680390] [<ffffffff81085980>] ? autoremove_wake_function+0x40/0x40 [787321.680416] [<ffffffff8111aeb5>] ? pagevec_lookup_tag+0x25/0x40 [787321.680439] [<ffffffff8110ed06>] filemap_fdatawait_range+0xf6/0x1a0 [787321.680483] [<ffffffffa022e740>] ? nfs_destroy_directcache+0x20/0x20 [nfs] [787321.680507] [<ffffffff8111a3b1>] ? do_writepages+0x21/0x40 [787321.680530] [<ffffffff8110ff8b>] ? __filemap_fdatawrite_range+0x5b/0x60 [787321.680555] [<ffffffff81110000>] filemap_write_and_wait_range+0x70/0x80 [787321.680580] [<ffffffff8119b45a>] vfs_fsync_range+0x5a/0x90 [787321.680602] [<ffffffff8119b4fc>] vfs_fsync+0x1c/0x20 [787321.680628] [<ffffffffa0222be4>] nfs_file_flush+0x54/0x80 [nfs] [787321.680653] [<ffffffff8116d66f>] filp_close+0x3f/0x90 [787321.680675] [<ffffffff8116e097>] sys_close+0xb7/0x120 [787321.680698] [<ffffffff816090c2>] system_call_fastpath+0x16/0x1b A network trace showed the following exchange below being continually repeated. Unfortunately this system too has since been rebooted. Any thoughts on the matter would be most welcome, Kind regards, ..david No. Time Source Destination Protocol Size Info 9880 11:40:12.833617 172.29.190.21 172.29.120.140 NFS 1122 V4 COMPOUND Call (Reply In 9881) <EMPTY> PUTFH;WRITE;GETATTR Frame 9880: 1122 bytes on wire (8976 bits), 1122 bytes captured (8976 bits) Arrival Time: Oct 17, 2011 11:40:12.833617000 BST Frame Length: 1122 bytes (8976 bits) Capture Length: 1122 bytes (8976 bits) Ethernet II, Src: ChelsioC_06:68:f9 (00:07:43:06:68:f9), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be) Internet Protocol, Src: 172.29.190.21 (172.29.190.21), Dst: 172.29.120.140 (172.29.120.140) Transmission Control Protocol, Src Port: 816 (816), Dst Port: nfs (2049), Seq: 5199745, Ack: 275801, Len: 1056 Remote Procedure Call, Type:Call XID:0x5daa6e93 Network File System [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: <EMPTY> length: 0 contents: <EMPTY> minorversion: 0 Operations (count: 3) Opcode: PUTFH (22) filehandle length: 36 [hash (CRC-32): 0x6e4b15f3] decode type as: unknown filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300... Opcode: WRITE (38) stateid seqid: 0x00000000 Data: 4e06f15b800f82e300000000 offset: 11392 stable: FILE_SYNC4 (2) Write length: 814 Data: <DATA> length: 814 contents: <DATA> fill bytes: opaque data Opcode: GETATTR (9) GETATTR4args attr_request bitmap[0] = 0x00000018 [2 attributes requested] mand_attr: FATTR4_CHANGE (3) mand_attr: FATTR4_SIZE (4) bitmap[1] = 0x00300000 [2 attributes requested] recc_attr: FATTR4_TIME_METADATA (52) recc_attr: FATTR4_TIME_MODIFY (53) No. Time Source Destination Protocol Size Info 9881 11:40:12.833956 172.29.120.140 172.29.190.21 NFS 122 V4 COMPOUND Reply (Call In 9880) <EMPTY> PUTFH;WRITE Frame 9881: 122 bytes on wire (976 bits), 122 bytes captured (976 bits) Arrival Time: Oct 17, 2011 11:40:12.833956000 BST [Time delta from previous captured frame: 0.000339000 seconds] Frame Length: 122 bytes (976 bits) Capture Length: 122 bytes (976 bits) Ethernet II, Src: Cisco_1e:f7:80 (00:13:5f:1e:f7:80), Dst: ChelsioC_06:68:f9 (00:07:43:06:68:f9) Internet Protocol, Src: 172.29.120.140 (172.29.120.140), Dst: 172.29.190.21 (172.29.190.21) Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 816 (816), Seq: 275801, Ack: 5200801, Len: 56 Remote Procedure Call, Type:Reply XID:0x5daa6e93 Network File System [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4ERR_BAD_STATEID (10025) Tag: <EMPTY> length: 0 contents: <EMPTY> Operations (count: 2) Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: WRITE (38) Status: NFS4ERR_BAD_STATEID (10025) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html