NFS v4 client blocks when using UDP protocol

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

NFS v4 client blocks under heavy load, when using UDP protocol.
No progress can be seen, the test program "iozone" cannot be killed.

"nfsiostat 10 -p" reports 2 "nfs_writepages()" calls on every 10 seconds:

server:/ mounted on /mnt/test:
   op/s		rpc bklog
   0.00 	   0.00
read:             ops/s		   kB/s		  kB/op		retrans		avg RTT (ms)	avg exe (ms)
		  0.000 	  0.000 	  0.000        0 (0.0%) 	  0.000 	  0.000
write:            ops/s		   kB/s		  kB/op		retrans		avg RTT (ms)	avg exe (ms)
		  0.000 	  0.000 	  0.000        0 (0.0%) 	  0.000 	  0.000
0 nfs_readpage() calls read 0 pages
0 nfs_readpages() calls read 0 pages
0 nfs_updatepage() calls
0 nfs_writepage() calls wrote 0 pages
2 nfs_writepages() calls wrote 0 pages (0.0 pages per call)

Checking the test program with the "crash" reports:

PID: 6185   TASK: ffff8804047020c0  CPU: 0   COMMAND: "iozone"
 #0 [ffff88036ce638b8] schedule at ffffffff814d9459
 #1 [ffff88036ce63980] schedule_timeout at ffffffff814da152
 #2 [ffff88036ce63a30] io_schedule_timeout at ffffffff814d8f2f
 #3 [ffff88036ce63a60] balance_dirty_pages at ffffffff811213d1
 #4 [ffff88036ce63b80] balance_dirty_pages_ratelimited_nr at ffffffff811217e4
 #5 [ffff88036ce63b90] generic_file_buffered_write at ffffffff8110d993
 #6 [ffff88036ce63c60] __generic_file_aio_write at ffffffff8110f230
 #7 [ffff88036ce63d20] generic_file_aio_write at ffffffff8110f4cf
 #8 [ffff88036ce63d70] nfs_file_write at ffffffffa05e4c2e
 #9 [ffff88036ce63dc0] do_sync_write at ffffffff81170bda
#10 [ffff88036ce63ef0] vfs_write at ffffffff81170ed8
#11 [ffff88036ce63f30] sys_write at ffffffff81171911
#12 [ffff88036ce63f80] system_call_fastpath at ffffffff8100b172
    RIP: 0000003709a0e490  RSP: 00007fffba8eac78  RFLAGS: 00000246
    RAX: 0000000000000001  RBX: ffffffff8100b172  RCX: 0000003709a0ec10
    RDX: 0000000000400000  RSI: 00007fe469600000  RDI: 0000000000000003
    RBP: 0000000000000298   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fe469600000
    R13: 0000000000eb4c20  R14: 0000000000400000  R15: 0000000000eb4c20
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

No activity can be seen with wireshark on the network.
Rebooting the client and restarting the test can show another failing point:

PID: 4723   TASK: ffff8803fc235580  CPU: 2   COMMAND: "iozone"
 #0 [ffff880417e2dc08] schedule at ffffffff814d9459
 #1 [ffff880417e2dcd0] io_schedule at ffffffff814d9c43
 #2 [ffff880417e2dcf0] sync_page at ffffffff8110d01d
 #3 [ffff880417e2dd00] __wait_on_bit at ffffffff814da4af
 #4 [ffff880417e2dd50] wait_on_page_bit at ffffffff8110d1d3
 #5 [ffff880417e2ddb0] wait_on_page_writeback_range at ffffffff8110d5eb
 #6 [ffff880417e2deb0] filemap_write_and_wait_range at ffffffff8110d7b8
 #7 [ffff880417e2dee0] vfs_fsync_range at ffffffff8119f11e
 #8 [ffff880417e2df30] vfs_fsync at ffffffff8119f1ed
 #9 [ffff880417e2df40] do_fsync at ffffffff8119f22e
#10 [ffff880417e2df70] sys_fsync at ffffffff8119f280
#11 [ffff880417e2df80] system_call_fastpath at ffffffff8100b172
    RIP: 0000003709a0ebb0  RSP: 00007fff5517dc28  RFLAGS: 00000212
    RAX: 000000000000004a  RBX: ffffffff8100b172  RCX: 00000000000000ef
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000003
    RBP: 0000000000008000   R8: 00000000ffffffff   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: ffffffff8119f280
    R13: ffff880417e2df78  R14: 00000000011d7c30  R15: 0000000000000000
    ORIG_RAX: 000000000000004a  CS: 0033  SS: 002b

Sometimes console messages appear, like:

INFO: task iozone:4723 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
iozone        D 0000000000000002     0  4723   4683 0x00000080
ffff880417e2dcc8 0000000000000086 ffff880417e2dc48 ffffffffa042fc99
ffff88036e53d440 ffff8803e386aac0 ffff880402cf4610 ffff88036e53d448
ffff8803fc235b38 ffff880417e2dfd8 000000000000f558 ffff8803fc235b38
Call Trace:
[<ffffffffa042fc99>] ? rpc_run_task+0xd9/0x130 [sunrpc]
[<ffffffff81098aa9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff8110cfe0>] ? sync_page+0x0/0x50
[<ffffffff814d9c43>] io_schedule+0x73/0xc0
[<ffffffff8110d01d>] sync_page+0x3d/0x50
[<ffffffff814da4af>] __wait_on_bit+0x5f/0x90
[<ffffffff8110d1d3>] wait_on_page_bit+0x73/0x80
[<ffffffff8108df40>] ? wake_bit_function+0x0/0x50
[<ffffffff81122f25>] ? pagevec_lookup_tag+0x25/0x40
[<ffffffff8110d5eb>] wait_on_page_writeback_range+0xfb/0x190
[<ffffffff8110d7b8>] filemap_write_and_wait_range+0x78/0x90
[<ffffffff8119f11e>] vfs_fsync_range+0x7e/0xe0
[<ffffffff8119f1ed>] vfs_fsync+0x1d/0x20
[<ffffffff8119f22e>] do_fsync+0x3e/0x60
[<ffffffff8119f280>] sys_fsync+0x10/0x20
[<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

This problem can be systematically reproduced with the following test:

iozone -a -c -e -E -L 64 -m -S 4096 -f /mnt/test/file -g 20G -n 1M -y 4K

It can be reproduced with both type of networks we have, see below.

Switching on RPC debugging with "sysctl -w sunrpc.rpc_debug=65535" gives:

-pid- flgs status -client- --rqstp- -timeout ---ops--
28139 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
29167 0001    -11 ffff88040f403400 ffff8803ff405130        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
29172 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
...
51364 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
55355 0001    -11 ffff88040f403400 ffff8803ff405450        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
55356 0001    -11 ffff88040f403400 ffff8803ff404000        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
55357 0001    -11 ffff88040f403400 ffff8803ff404c80        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
55369 0001    -11 ffff88040f403400 ffff8803ff4055e0        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
...
55370 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
55371 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
...

No error messages in "dmesg" neither in "/var/log/messages" at the server side.


Our test HW:
2 * Supermicro R460, 2 dual core Xeons 5150 2.66GHz in each machine
InfiniBand Mellanox Technologies MT26418 in DDR mode, IP over IB
Intel Corporation 80003ES2LAN Gigabit Ethernet Controller, point to point direct connection

SW:
Kernels up to 2.6.38-rc8


Server side:

A SW RAID "/dev/md0" mounted on "/md0" is exported:

/dev/md0:
         Version : 0.90
   Creation Time : Wed Feb 18 17:18:34 2009
      Raid Level : raid0
      Array Size : 357423360 (340.87 GiB 366.00 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 0
     Persistence : Superblock is persistent
     Update Time : Wed Feb 18 17:18:34 2009
           State : clean
  Active Devices : 5
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0
      Chunk Size : 256K
            UUID : 05fae940:c7540b1e:d0f324d8:8f753e72
          Events : 0.1

     Number   Major   Minor   RaidDevice State
        0       8       16        0      active sync   /dev/sdb
        1       8       32        1      active sync   /dev/sdc
        2       8       48        2      active sync   /dev/sdd
        3       8       64        3      active sync   /dev/sde
        4       8       80        4      active sync   /dev/sdf

/etc/exports:
/md0 *(rw,insecure,no_subtree_check,sync,fsid=0)

The client machine mounts it as follows:

mount -o async,udp 192.168.1.79:/ /mnt/test	# Ethernet
or
mount -o async,udp 192.168.42.79:/ /mnt/test	# IP over IB


Have you already seen this problem?
Thank you in advance,

Zoltan Menyhart

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux