Got a complaint from a user - the native GlusterFS mountpoint was completely inaccessible from many (if not all) clients attempting to read or write from it. Apparently not the fault of GlusterFS - here's the entry from the messages file: Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.692284] INFO: task glusterfsd:12902 blocked for more than 120 seconds. Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.692544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.693037] glusterfsd D ffffffff80151248 0 12902 1 12904 12903 (NOTLB) Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.693553] ffff81061190bbf8 0000000000000086 ffff81061190bea8 0000000000000000 Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.694099] 000000000000000c 000000000000000a ffff810627eec0c0 ffff810c27f32100 Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.694660] 000abc5dc58f770c 0000000000005135 ffff810627eec2a8 000000038000b3fd ... and here's one for a non-Gluster process: Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.761299] INFO: task jbd2/cciss!c2d0:4090 blocked for more than 120 seconds. Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.761908] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.762505] jbd2/cciss!c2 D ffffffff80151248 0 4090 456 4091 4085 (L-TLB) Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.763129] ffff810617e45d60 0000000000000046 ffff810617e45da0 ffffffff8008ccb0 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.763753] ffff810617e45cf0 000000000000000a ffff81063d22e820 ffff810c20b3c100 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.764370] 000abbf070cd535b 0000000000003c6a ffff81063d22ea08 0000000300000000 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.764693] Call Trace: Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.765247] [<ffffffff8008ccb0>] find_busiest_group+0x20d/0x621 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.765543] [<ffffffff88342fad>] :jbd2:jbd2_journal_commit_transaction+0x191/0x1080 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.766064] [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.766327] [<ffffffff8003ddd5>] lock_timer_base+0x1b/0x3c Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.766588] [<ffffffff8004b6b6>] try_to_del_timer_sync+0x7f/0x88 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.766853] [<ffffffff88346d72>] :jbd2:kjournald2+0x9a/0x1ec Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.767109] [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.767374] [<ffffffff88346cd8>] :jbd2:kjournald2+0x0/0x1ec Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.767627] [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.767880] [<ffffffff80032bdc>] kthread+0xfe/0x132 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.768138] [<ffffffff8005efb1>] child_rip+0xa/0x11 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.768399] [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.768656] [<ffffffff80032ade>] kthread+0x0/0x132 Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.768922] [<ffffffff8005efa7>] child_rip+0x0/0x11 Haven't found the specific bug number for this (CentOS 5.5) yet. Running GlusterFS 3.1.3 on clients and 2 servers setup up as Replicated-Distribute. Hopefully this will help others. I will be upgrading to CentOS 5.6 as soon as possible on these servers. Kudos to my coworker Joe Collette for running this issue to ground. James Burnash Unix Engineer Knight Capital Group DISCLAIMER: This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com