Zoltan- As far as I know, NFSv4 is not supported on UDP transports. Some of it may work, but no warrantee, either expressed or implied, is given. :-) On Mar 29, 2011, at 10:32 AM, Menyhart Zoltan wrote: > Hi, > > NFS v4 client blocks under heavy load, when using UDP protocol. > No progress can be seen, the test program "iozone" cannot be killed. > > "nfsiostat 10 -p" reports 2 "nfs_writepages()" calls on every 10 seconds: > > server:/ mounted on /mnt/test: > op/s rpc bklog > 0.00 0.00 > read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) > 0.000 0.000 0.000 0 (0.0%) 0.000 0.000 > write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) > 0.000 0.000 0.000 0 (0.0%) 0.000 0.000 > 0 nfs_readpage() calls read 0 pages > 0 nfs_readpages() calls read 0 pages > 0 nfs_updatepage() calls > 0 nfs_writepage() calls wrote 0 pages > 2 nfs_writepages() calls wrote 0 pages (0.0 pages per call) > > Checking the test program with the "crash" reports: > > PID: 6185 TASK: ffff8804047020c0 CPU: 0 COMMAND: "iozone" > #0 [ffff88036ce638b8] schedule at ffffffff814d9459 > #1 [ffff88036ce63980] schedule_timeout at ffffffff814da152 > #2 [ffff88036ce63a30] io_schedule_timeout at ffffffff814d8f2f > #3 [ffff88036ce63a60] balance_dirty_pages at ffffffff811213d1 > #4 [ffff88036ce63b80] balance_dirty_pages_ratelimited_nr at ffffffff811217e4 > #5 [ffff88036ce63b90] generic_file_buffered_write at ffffffff8110d993 > #6 [ffff88036ce63c60] __generic_file_aio_write at ffffffff8110f230 > #7 [ffff88036ce63d20] generic_file_aio_write at ffffffff8110f4cf > #8 [ffff88036ce63d70] nfs_file_write at ffffffffa05e4c2e > #9 [ffff88036ce63dc0] do_sync_write at ffffffff81170bda > #10 [ffff88036ce63ef0] vfs_write at ffffffff81170ed8 > #11 [ffff88036ce63f30] sys_write at ffffffff81171911 > #12 [ffff88036ce63f80] system_call_fastpath at ffffffff8100b172 > RIP: 0000003709a0e490 RSP: 00007fffba8eac78 RFLAGS: 00000246 > RAX: 0000000000000001 RBX: ffffffff8100b172 RCX: 0000003709a0ec10 > RDX: 0000000000400000 RSI: 00007fe469600000 RDI: 0000000000000003 > RBP: 0000000000000298 R8: 0000000000000000 R9: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe469600000 > R13: 0000000000eb4c20 R14: 0000000000400000 R15: 0000000000eb4c20 > ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b > > No activity can be seen with wireshark on the network. > Rebooting the client and restarting the test can show another failing point: > > PID: 4723 TASK: ffff8803fc235580 CPU: 2 COMMAND: "iozone" > #0 [ffff880417e2dc08] schedule at ffffffff814d9459 > #1 [ffff880417e2dcd0] io_schedule at ffffffff814d9c43 > #2 [ffff880417e2dcf0] sync_page at ffffffff8110d01d > #3 [ffff880417e2dd00] __wait_on_bit at ffffffff814da4af > #4 [ffff880417e2dd50] wait_on_page_bit at ffffffff8110d1d3 > #5 [ffff880417e2ddb0] wait_on_page_writeback_range at ffffffff8110d5eb > #6 [ffff880417e2deb0] filemap_write_and_wait_range at ffffffff8110d7b8 > #7 [ffff880417e2dee0] vfs_fsync_range at ffffffff8119f11e > #8 [ffff880417e2df30] vfs_fsync at ffffffff8119f1ed > #9 [ffff880417e2df40] do_fsync at ffffffff8119f22e > #10 [ffff880417e2df70] sys_fsync at ffffffff8119f280 > #11 [ffff880417e2df80] system_call_fastpath at ffffffff8100b172 > RIP: 0000003709a0ebb0 RSP: 00007fff5517dc28 RFLAGS: 00000212 > RAX: 000000000000004a RBX: ffffffff8100b172 RCX: 00000000000000ef > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 > RBP: 0000000000008000 R8: 00000000ffffffff R9: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff8119f280 > R13: ffff880417e2df78 R14: 00000000011d7c30 R15: 0000000000000000 > ORIG_RAX: 000000000000004a CS: 0033 SS: 002b > > Sometimes console messages appear, like: > > INFO: task iozone:4723 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > iozone D 0000000000000002 0 4723 4683 0x00000080 > ffff880417e2dcc8 0000000000000086 ffff880417e2dc48 ffffffffa042fc99 > ffff88036e53d440 ffff8803e386aac0 ffff880402cf4610 ffff88036e53d448 > ffff8803fc235b38 ffff880417e2dfd8 000000000000f558 ffff8803fc235b38 > Call Trace: > [<ffffffffa042fc99>] ? rpc_run_task+0xd9/0x130 [sunrpc] > [<ffffffff81098aa9>] ? ktime_get_ts+0xa9/0xe0 > [<ffffffff8110cfe0>] ? sync_page+0x0/0x50 > [<ffffffff814d9c43>] io_schedule+0x73/0xc0 > [<ffffffff8110d01d>] sync_page+0x3d/0x50 > [<ffffffff814da4af>] __wait_on_bit+0x5f/0x90 > [<ffffffff8110d1d3>] wait_on_page_bit+0x73/0x80 > [<ffffffff8108df40>] ? wake_bit_function+0x0/0x50 > [<ffffffff81122f25>] ? pagevec_lookup_tag+0x25/0x40 > [<ffffffff8110d5eb>] wait_on_page_writeback_range+0xfb/0x190 > [<ffffffff8110d7b8>] filemap_write_and_wait_range+0x78/0x90 > [<ffffffff8119f11e>] vfs_fsync_range+0x7e/0xe0 > [<ffffffff8119f1ed>] vfs_fsync+0x1d/0x20 > [<ffffffff8119f22e>] do_fsync+0x3e/0x60 > [<ffffffff8119f280>] sys_fsync+0x10/0x20 > [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b > > This problem can be systematically reproduced with the following test: > > iozone -a -c -e -E -L 64 -m -S 4096 -f /mnt/test/file -g 20G -n 1M -y 4K > > It can be reproduced with both type of networks we have, see below. > > Switching on RPC debugging with "sysctl -w sunrpc.rpc_debug=65535" gives: > > -pid- flgs status -client- --rqstp- -timeout ---ops-- > 28139 0001 -11 ffff88040f403400 (null) 0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog > 29167 0001 -11 ffff88040f403400 ffff8803ff405130 0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend > 29172 0001 -11 ffff88040f403400 (null) 0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog > ... > 51364 0001 -11 ffff88040f403400 (null) 0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog > 55355 0001 -11 ffff88040f403400 ffff8803ff405450 0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend > 55356 0001 -11 ffff88040f403400 ffff8803ff404000 0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend > 55357 0001 -11 ffff88040f403400 ffff8803ff404c80 0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend > 55369 0001 -11 ffff88040f403400 ffff8803ff4055e0 0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend > ... > 55370 0001 -11 ffff88040f403400 (null) 0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog > 55371 0001 -11 ffff88040f403400 (null) 0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog > ... > > No error messages in "dmesg" neither in "/var/log/messages" at the server side. > > > Our test HW: > 2 * Supermicro R460, 2 dual core Xeons 5150 2.66GHz in each machine > InfiniBand Mellanox Technologies MT26418 in DDR mode, IP over IB > Intel Corporation 80003ES2LAN Gigabit Ethernet Controller, point to point direct connection > > SW: > Kernels up to 2.6.38-rc8 > > > Server side: > > A SW RAID "/dev/md0" mounted on "/md0" is exported: > > /dev/md0: > Version : 0.90 > Creation Time : Wed Feb 18 17:18:34 2009 > Raid Level : raid0 > Array Size : 357423360 (340.87 GiB 366.00 GB) > Raid Devices : 5 > Total Devices : 5 > Preferred Minor : 0 > Persistence : Superblock is persistent > Update Time : Wed Feb 18 17:18:34 2009 > State : clean > Active Devices : 5 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 0 > Chunk Size : 256K > UUID : 05fae940:c7540b1e:d0f324d8:8f753e72 > Events : 0.1 > > Number Major Minor RaidDevice State > 0 8 16 0 active sync /dev/sdb > 1 8 32 1 active sync /dev/sdc > 2 8 48 2 active sync /dev/sdd > 3 8 64 3 active sync /dev/sde > 4 8 80 4 active sync /dev/sdf > > /etc/exports: > /md0 *(rw,insecure,no_subtree_check,sync,fsid=0) > > The client machine mounts it as follows: > > mount -o async,udp 192.168.1.79:/ /mnt/test # Ethernet > or > mount -o async,udp 192.168.42.79:/ /mnt/test # IP over IB > > > Have you already seen this problem? > Thank you in advance, > > Zoltan Menyhart > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html