Hi, I'm new to work with clusters and I've started setting up samba with HA. For that, I've a "storage server" from supermicro with 16 2TB drives and dual server slots that access the drives directly. I've set a md raid 6 with the drives and created a LVM volume on top of the raid with one lv and gfs2 partition to create a common storage for the servers. Only one of the servers is member of the cluster right now with Centos 6.4. >From time to time, some users get corrupted files when they copy to the file server and I've asked to samba list about it and they pointed to this list for help. The error I've found is this: Oct 22 17:50:04 nas kernel: INFO: task smbd:5994 blocked for more than 120 seconds. Oct 22 17:50:04 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 17:50:04 nas kernel: smbd D 0000000000000000 0 5994 4870 0x00000084 Oct 22 17:50:04 nas kernel: ffff88017c61b948 0000000000000086 ffff8801bb44b500 0000000000000000 Oct 22 17:50:04 nas kernel: 00000000ffffffff 00000000ffffffff ffff88017c61b998 ffffffff8105b4d3 Oct 22 17:50:04 nas kernel: ffff880175b03058 ffff88017c61bfd8 000000000000fb88 ffff880175b03058 Oct 22 17:50:04 nas kernel: Call Trace: Oct 22 17:50:04 nas kernel: [<ffffffff8105b4d3>] ? perf_event_task_sched_out+0x33/0x80 Oct 22 17:50:04 nas kernel: [<ffffffffa0557570>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa055757e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffff814fef1f>] __wait_on_bit+0x5f/0x90 Oct 22 17:50:04 nas kernel: [<ffffffffa0557570>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffff814fefc8>] out_of_line_wait_on_bit+0x78/0x90 Oct 22 17:50:04 nas kernel: [<ffffffff81092190>] ? wake_bit_function+0x0/0x50 Oct 22 17:50:04 nas kernel: [<ffffffffa05594f5>] gfs2_glock_wait+0x45/0x90 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa055a8f7>] gfs2_glock_nq+0x237/0x3d0 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa055c5c9>] gfs2_inode_lookup+0x129/0x300 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa054ea4d>] ? gfs2_dirent_search+0x16d/0x1a0 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa054efbe>] gfs2_dir_search+0x5e/0x80 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa055c32e>] gfs2_lookupi+0xde/0x1e0 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa0559ca8>] ? do_promote+0x208/0x330 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa055c39d>] ? gfs2_lookupi+0x14d/0x1e0 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffffa0569c76>] gfs2_lookup+0x36/0xd0 [gfs2] Oct 22 17:50:04 nas kernel: [<ffffffff81193ed7>] ? d_alloc+0x137/0x1b0 Oct 22 17:50:04 nas kernel: [<ffffffff81189935>] do_lookup+0x1a5/0x230 Oct 22 17:50:04 nas kernel: [<ffffffff81189ccd>] __link_path_walk+0x20d/0x1030 Oct 22 17:50:04 nas kernel: [<ffffffff814a1d1a>] ? inet_recvmsg+0x5a/0x90 Oct 22 17:50:04 nas kernel: [<ffffffff8118ad7a>] path_walk+0x6a/0xe0 Oct 22 17:50:04 nas kernel: [<ffffffff8118af4b>] do_path_lookup+0x5b/0xa0 Oct 22 17:50:04 nas kernel: [<ffffffff8118bbb7>] user_path_at+0x57/0xa0 Oct 22 17:50:04 nas kernel: [<ffffffff81092150>] ? autoremove_wake_function+0x0/0x40 Oct 22 17:50:04 nas kernel: [<ffffffff811807ec>] vfs_fstatat+0x3c/0x80 Oct 22 17:50:04 nas kernel: [<ffffffff8118095b>] vfs_stat+0x1b/0x20 Oct 22 17:50:04 nas kernel: [<ffffffff81180984>] sys_newstat+0x24/0x50 Oct 22 17:50:04 nas kernel: [<ffffffff810d6ce2>] ? audit_syscall_entry+0x272/0x2a0 Oct 22 17:50:04 nas kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Can any one help me to debugg this? I'm trying to find out if this is a gfs2 problem or it may be related to the hardware. I've started a gfs2 fsck but it takes too long and I rather try to figure out the possible cause before leaving out of production the server for more than 2 days (that is what I'm estimating it may take according to how much it took to partially do pass1). Regards, -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster