Hi Brian, ok tcpdump and strace is "raining". He is doing a lot and he is connected to .40 and .41. During my tests now I found out that the hangs are random. There is one touch I have been trying the whole day and this one is still hanging. Other touches (other directories/filesnames) work randomly. What I found now using dmesg is: [17514.155548] INFO: task touch:25873 blocked for more than 120 seconds. [17514.155583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17514.155627] touch D 0000000000000000 0 25873 20727 0x00000004 [17514.155630] ffff88023f0e2a60 0000000000000086 0000000000000000 ffffffff81049fd0 [17514.155633] ffff8801dc7c3c28 000000000000f9e0 ffff8801dc7c3fd8 0000000000015780 [17514.155635] 0000000000015780 ffff88023bc8e2e0 ffff88023bc8e5d8 000000058103a866 [17514.155638] Call Trace: [17514.155644] [<ffffffff81049fd0>] ? try_to_wake_up+0x249/0x259 [17514.155646] [<ffffffff8103f80c>] ? __wake_up+0x30/0x44 [17514.155651] [<ffffffffa0181ab1>] ? fuse_request_send+0x1a2/0x255 [fuse] [17514.155654] [<ffffffff810649da>] ? autoremove_wake_function+0x0/0x2e [17514.155657] [<ffffffffa0182992>] ? fuse_request_alloc+0x22/0x27 [fuse] [17514.155660] [<ffffffffa0187da2>] ? fuse_file_alloc+0xc4/0xeb [fuse] [17514.155663] [<ffffffffa0184646>] ? fuse_create+0x1ce/0x38f [fuse] [17514.155667] [<ffffffff810f7180>] ? vfs_create+0x6d/0x89 [17514.155669] [<ffffffff810f80a9>] ? do_filp_open+0x31e/0x94b [17514.155673] [<ffffffff810cc2d5>] ? handle_mm_fault+0x3b8/0x80f [17514.155676] [<ffffffff810ec8af>] ? do_sys_open+0x55/0xfc [17514.155678] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b Does not look too good. This reminds me about a kernel/gluster bug I had a year ago and could only be fixed turning off "quickread" (it still is off, I checked). Any other ideas? - Stefan -----Urspr?ngliche Nachricht----- Von: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] Im Auftrag von Stefan Becker Gesendet: Sonntag, 5. Februar 2012 22:30 An: Brian Candler Cc: gluster-users at gluster.org Betreff: Re: Hanging writes after upgrading "clients" to debian squeeze Hi Brian, thanks for your help, I will play around with what you said and come back with results or the solution :) Greets, Stefan -----Urspr?ngliche Nachricht----- Von: Brian Candler [mailto:B.Candler at pobox.com] Gesendet: Sonntag, 5. Februar 2012 22:28 An: Stefan Becker Cc: Whit Blauvelt; gluster-users at gluster.org Betreff: Re: Hanging writes after upgrading "clients" to debian squeeze On Sun, Feb 05, 2012 at 09:49:47PM +0100, Stefan Becker wrote: > - no ip tables involved OK. So how about this on the client: tcpdump -i eth0 -nn host 10.10.100.40 or host 10.10.100.41 (replace eth0 as necessary) That will show you traffic to and from the bricks. When you issue a write (e.g. touch /path/to/foo), does traffic only go out to one brick? Do you see any TCP retransmissions? Does 'netstat -nt' show TCP connections to both bricks? Does Send-Q stay at zero most of the time, or is it stuck at a non-zero value? You could also try: strace -p <pid-of-glusterfs-process> on the client as well. You should see writev(fd,...) and readv(fd,...) with different fds for communication to each of the bricks. Then try issuing a single write. The strace output may not tell you much by itself, but if you compare what you see on a non-upgraded (working) client versus an upgraded (broken) client, you might be able to see what it's getting stuck on. Regards, Brian. _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users