Dear List, we are having problems with Oracle backups to a cifs share on 2 Nodes: 1) Linux Host: RHEL 5U6, Kernel 2.6.18-238.el5 modinfo cifs modinfo cifs filename: /lib/modules/2.6.18-238.el5/kernel/fs/cifs/cifs.ko version: 1.60RH description: VFS to access servers complying with the SNIA CIFS Specification e.g. Samba and Windows license: GPL author: Steve French <sfrench@xxxxxxxxxx> srcversion: 9326EB4A9ECCBAE9A62AE8B depends: vermagic: 2.6.18-238.el5 SMP mod_unload gcc-4.1 parm: CIFSMaxBufSize:Network buffer size (not including header). Default: 16384 Range: 8192 to 130048 (int) parm: cifs_min_rcv:Network buffers in pool. Default: 4 Range: 1 to 64 (int) parm: cifs_min_small:Small network buffers in pool. Default: 30 Range: 2 to 256 (int) parm: cifs_max_pending:Simultaneous requests to server. Default: 50 Range: 2 to 256 (int) module_sig: 883f3504de5f29f43e909ab54b946c1123a1f0a09fbdf3d22c0ce64d38a5d1cf3210bdaf264a20a0b4c43be6224ac24697783fb0e98629bab7cd59 /var/log/messages: Feb 3 01:05:30 myhost kernel: CIFS VFS: No response to cmd 5 mid 39189 Feb 3 01:05:30 myhost kernel: CIFS VFS: Send error in Flush = -11 Feb 3 01:05:51 myhost kernel: CIFS VFS: No response for cmd 50 mid 39195 Feb 3 01:11:39 myhost kernel: CIFS VFS: No response to cmd 5 mid 59671 Feb 3 01:11:39 myhost kernel: CIFS VFS: Send error in Flush = -11 Feb 3 01:15:34 myhost kernel: CIFS VFS: No response to cmd 5 mid 26833 Feb 3 01:15:34 myhost kernel: CIFS VFS: Send error in Flush = -11 Feb 3 01:15:55 myhost kernel: CIFS VFS: No response for cmd 50 mid 26839 Feb 3 01:20:19 myhost kernel: CIFS VFS: No response to cmd 5 mid 59613 Feb 3 01:20:19 myhost kernel: CIFS VFS: Send error in Flush = -11 Feb 3 01:24:40 myhost kernel: CIFS VFS: No response to cmd 5 mid 26754 Feb 3 01:24:40 myhost kernel: CIFS VFS: Send error in Flush = -11 Feb 3 01:25:01 myhost kernel: CIFS VFS: No response for cmd 50 mid 26760 Feb 3 01:25:22 myhost kernel: CIFS VFS: No response for cmd 50 mid 26764 Feb 3 01:25:43 myhost kernel: CIFS VFS: No response for cmd 50 mid 26768 Feb 3 01:25:43 myhost kernel: CIFS VFS: Unexpected lookup error -112 Feb 3 02:12:27 myhost kernel: CIFS VFS: No response to cmd 5 mid 64725 Feb 3 02:12:27 myhost kernel: CIFS VFS: Send error in Flush = -11 Feb 3 08:24:17 myhost kernel: CIFS VFS: No response for cmd 114 mid 37457 Feb 3 08:41:26 myhost kernel: CIFS VFS: No response to cmd 5 mid 30290 Feb 3 08:41:26 myhost kernel: CIFS VFS: Send error in Flush = -11 cat /proc/fs/cifs/DebugData Display Internal CIFS Data Structures for Debugging --------------------------------------------------- CIFS Version 1.60RH Active VFS Requests: 0 Servers: 1) Name: 10.1.1.6 Domain: MYDOM Uses: 1 OS: Windows Server 2012 R2 Standard 9600 NOS: Windows Server 2012 R2 Standard 6.3 Capability: 0x1e3fc SMB session status: 1 TCP status: 1 Local Users To Server: 1 SecMode: 0x3 Req On Wire: 0 Shares: 1) \\10.1.1.6\backup_export Mounts: 1 Type: NTFS DevInfo: 0x20020 Attributes: 0xc700ef PathComponentMax: 255 Status: 0x1 type: DISK /etc/fstab: //10.1.1.6/backup_export /backup_export cifs username=myuser,password=mypassword$,_netdev,uid=oracle,gid=oinstall 0 0 df -h: //10.1.1.67/backup_export 24T 15T 9.2T 62% /backup_export 2) On second node , the CIFS share is even hanging: RHEL 6U6 Kernel: 2.6.32-504.23.4.el6.x86_64 modinfo cifs: filename: /lib/modules/2.6.32-504.23.4.el6.x86_64/kernel/fs/cifs/cifs.ko version: 1.68 description: VFS to access servers complying with the SNIA CIFS Specification e.g. Samba and Windows license: GPL author: Steve French <sfrench@xxxxxxxxxx> srcversion: 6A099079DA5C8D2B7A710D3 depends: vermagic: 2.6.32-504.23.4.el6.x86_64 SMP mod_unload modversions parm: CIFSMaxBufSize:Network buffer size (not including header). Default: 16384 Range: 8192 to 130048 (int) parm: cifs_min_rcv:Network buffers in pool. Default: 4 Range: 1 to 64 (int) parm: cifs_min_small:Small network buffers in pool. Default: 30 Range: 2 to 256 (int) parm: cifs_max_pending:Simultaneous requests to server. Default: 32767 Range: 2 to 32767. (int) parm: enable_oplocks:Enable or disable oplocks (bool). Default:y/Y/1 (bool) df -h hangs In /var/ [root@myhost02 log]# ps -ef |grep cifs root 9515 2 0 2015 ? 00:00:01 [cifsiod] root 12818 12398 0 09:46 pts/6 00:00:00 grep cifs root 22144 2 0 Jan08 ? 03:18:51 [cifsd] [root@myhost02 log]# cat /proc/22144/stack [<ffffffff8144c539>] sk_wait_data+0xd9/0xe0 [<ffffffff814a77f7>] tcp_recvmsg+0x347/0x10f0 [<ffffffff814c922a>] inet_recvmsg+0x5a/0x90 [<ffffffff8144a5d3>] sock_recvmsg+0x133/0x160 [<ffffffff8144a644>] kernel_recvmsg+0x44/0x60 [<ffffffffa0378749>] cifs_readv_from_socket+0x1a9/0x260 [cifs] [<ffffffffa0378827>] cifs_read_from_socket+0x27/0x30 [cifs] [<ffffffffa0378983>] cifs_demultiplex_thread+0x153/0xe20 [cifs] [<ffffffff8109e78e>] kthread+0x9e/0xc0 [<ffffffff8100c28a>] child_rip+0xa/0x20 [<ffffffffffffffff>] 0xffffffffffffffff [root@myhost02 log]# cat /proc/9515/sta cat: /proc/9515/sta: No such file or directory [root@myhost02 log]# cat /proc/9515/stack [<ffffffff8109818c>] worker_thread+0x1fc/0x2a0 [<ffffffff8109e78e>] kthread+0x9e/0xc0 [<ffffffff8100c28a>] child_rip+0xa/0x20 [<ffffffffffffffff>] 0xffffffffffffffff [root@myhost02 log]# cat /proc/fs/cifs/DebugData Display Internal CIFS Data Structures for Debugging --------------------------------------------------- CIFS Version 1.68 Active VFS Requests: 10 Servers: 1) Name: 10.1.1.67 Domain: MYDOM Uses: 1 OS: Windows Server 2012 R2 Standard 9600 NOS: Windows Server 2012 R2 Standard 6.3 Capability: 0x1e3fc SMB session status: 1 TCP status: 4 Local Users To Server: 1 SecMode: 0x3 Req On Wire: 1 Shares: 1) \\10.1.1.67\backup_export Mounts: 1 Type: NTFS DevInfo: 0x20020 Attributes: 0xc700ef PathComponentMax: 255 Status: 0x2 type: DISK DISCONNECTED MIDs: State: 2 com: 114 pid: 4816 cbdata: ffff880106ab2040 mid 24743 Feb 3 05:46:33 atfkora02 kernel: CIFS VFS: Server 10.10.10.67 has not responded in 120 seconds. Reconnecting... Feb 3 05:53:39 atfkora02 kernel: INFO: task bdi-default:69 blocked for more than 120 seconds. Feb 3 05:53:39 atfkora02 kernel: Not tainted 2.6.32-504.23.4.el6.x86_64 #1 Feb 3 05:53:39 atfkora02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 3 05:53:39 atfkora02 kernel: bdi-default D 000000000000000b 0 69 2 0x00000000 Feb 3 05:53:39 atfkora02 kernel: ffff881029bef930 0000000000000046 0000000000000000 0000000700000007 Feb 3 05:53:39 atfkora02 kernel: ffff881000000000 0000000000000400 003841dbe1cae16f 0000000000000000 Feb 3 05:53:39 atfkora02 kernel: 0000000000000007 00000004b02fe0f9 ffff881029be7068 ffff881029beffd8 Feb 3 05:53:39 atfkora02 kernel: Call Trace: Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8152b3a6>] __mutex_lock_slowpath+0x96/0x210 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81215f39>] ? find_nls+0x59/0x100 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8152aecb>] mutex_lock+0x2b/0x50 Feb 3 05:53:39 atfkora02 kernel: [<ffffffffa03664b2>] cifs_reconnect_tcon+0x152/0x320 [cifs] Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81064c35>] ? wake_up_process+0x15/0x20 Feb 3 05:53:39 atfkora02 kernel: [<ffffffffa0366837>] small_smb_init+0x37/0x90 [cifs] Feb 3 05:53:39 atfkora02 kernel: [<ffffffffa0370e7c>] cifs_async_writev+0x7c/0x280 [cifs] Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81139cd9>] ? test_set_page_writeback+0xe9/0x1a0 Feb 3 05:53:39 atfkora02 kernel: [<ffffffffa037ef9f>] cifs_writepages+0x43f/0x700 [cifs] Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8105ca84>] ? find_busiest_group+0x244/0x9e0 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81139871>] do_writepages+0x21/0x40 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff811bafbd>] writeback_single_inode+0xdd/0x290 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff811bb3bd>] writeback_sb_inodes+0xbd/0x170 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff811bb51b>] writeback_inodes_wb+0xab/0x1b0 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff811bb913>] wb_writeback+0x2f3/0x410 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81529a1e>] ? thread_return+0x4e/0x7d0 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff811bbbd5>] wb_do_writeback+0x1a5/0x240 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81087540>] ? process_timeout+0x0/0x10 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff811482da>] bdi_forker_task+0x6a/0x310 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81148270>] ? bdi_forker_task+0x0/0x310 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Feb 3 05:53:39 atfkora02 kernel: INFO: task kslowd002:4819 blocked for more than 120 seconds. Feb 3 05:53:39 atfkora02 kernel: Not tainted 2.6.32-504.23.4.el6.x86_64 #1 Feb 3 05:53:39 atfkora02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 3 05:53:39 atfkora02 kernel: kslowd002 D 0000000000000007 0 4819 2 0x00000080 Feb 3 05:53:39 atfkora02 kernel: ffff8801100f3bf0 0000000000000046 0000000000000000 0000000000000286 Feb 3 05:53:39 atfkora02 kernel: ffff8801100f3b70 ffffffff8108742c 003841deeab11af3 00000000ffffffff Feb 3 05:53:39 atfkora02 kernel: ffff882029f1c000 00000004b03013f5 ffff88102584f068 ffff8801100f3fd8 Feb 3 05:53:39 atfkora02 kernel: Call Trace: Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8108742c>] ? lock_timer_base+0x3c/0x70 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8152b3a6>] __mutex_lock_slowpath+0x96/0x210 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81215f39>] ? find_nls+0x59/0x100 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8152aecb>] mutex_lock+0x2b/0x50 Feb 3 05:53:39 atfkora02 kernel: [<ffffffffa03664b2>] cifs_reconnect_tcon+0x152/0x320 [cifs] Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8109ec20>] ? autoremove_wake_function+0x0/0x40 Feb 3 05:53:39 atfkora02 kernel: [<ffffffffa0366837>] small_smb_init+0x37/0x90 [cifs] Feb 3 05:53:39 atfkora02 kernel: [<ffffffffa0370e7c>] cifs_async_writev+0x7c/0x280 [cifs] Feb 3 05:53:39 atfkora02 kernel: [<ffffffffa0371208>] cifs_writev_complete+0x188/0x270 [cifs] Feb 3 05:53:39 atfkora02 kernel: [<ffffffff811180f3>] slow_work_execute+0x233/0x310 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff81118327>] slow_work_thread+0x157/0x360 Feb 3 05:53:39 atfkora02 kernel: [<ffffffff8109ec20>] ? autoremove_wake_function+0x0/0x40 Questions: 1) These might be two independent problems. What additional information do I have to gather to find the root cause? 2) How can I fix the hanging CIFS mount without bouncing the linux node? umount -f does not work - says "Device or resource busy". 3) Are there any recommendations on how to set up a big CIFS share for backups from different Linux Nodes? (Directory-structure, etc.) Thank you in advance, Martin -- -- Martin Decker -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html