After I updated to gluster 3.2.0, I noticed that on 2 separate dist/repl setups I hit a problem where a writing process hung in a disk 'D' state on the mount. Nothing other than a reboot (even a kill -9) would kill said process. An strace did not show any information once the process hit the 'D' state. Now it could be fuse, or not. I wasn't running 3.1.4 all that long. Maybe a month and a half. But this problem has shown up with 3.2.0 right away, within a day. * First instance of problem Occurred rsyncing files from a client's gluster mount to a remote server, e.g. rsync /gluster_path user at host::remote_path Setup: 2 gluster servers, RAID1 replicated Servers/Client: RHEL5.5, kernel 2.6.18-238.1.1el5.x86_64 fuse-libs-2.7.4-8.el5 glusterfs-core-3.2.0-1 fuse-2.7.4-8.el5 glusterfs-fuse-3.2.0-1 Errors: INFO: task rsync:10652 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rsync D ffff81000103f1a0 0 10652 10642 (NOTLB) ffff81021d641c08 0000000000000086 ffff810133ef61d0 ffffffff883e8219 ffff810201df8600 000000000000000a ffff8101121c9080 ffff81022fce1100 000616db22314a67 000000000000f9a2 ffff8101121c9268 00000007883ecf35 Call Trace: [<ffffffff883e8219>] :fuse:flush_bg_queue+0x2b/0x48 [<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90 [<ffffffff80028b0b>] sync_page+0x0/0x43 [<ffffffff800637ca>] io_schedule+0x3f/0x67 [<ffffffff80028b49>] sync_page+0x3e/0x43 [<ffffffff8006390e>] __wait_on_bit_lock+0x36/0x66 [<ffffffff8003fdc1>] __lock_page+0x5e/0x64 [<ffffffff800a28e2>] wake_bit_function+0x0/0x23 [<ffffffff8000c3d7>] do_generic_mapping_read+0x1df/0x359 [<ffffffff8000d1c3>] file_read_actor+0x0/0x159 [<ffffffff8000c69d>] __generic_file_aio_read+0x14c/0x198 [<ffffffff800c8c08>] generic_file_read+0xac/0xc5 [<ffffffff801ab4a8>] tty_default_put_char+0x1d/0x1f [<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e [<ffffffff8003a8cf>] tty_ldisc_deref+0x68/0x7b [<ffffffff8000b78d>] vfs_read+0xcb/0x171 [<ffffffff80011c7e>] sys_read+0x45/0x6e [<ffffffff8005d116>] system_call+0x7e/0x83 * Second instance of problem Occurred with apache accessing CGI script on client's gluster mount. Process also in disk 'D' state like other setup. Setup: 2 gluster servers, RAID1 replicated Servers: CentOS 5.6, kernel 2.6.18-194.32.1.el5.x86_64 Client: CentOS 5.6, kernl 2.6.18-238.9.1.el5 fuse-2.7.4-8.el5 glusterfs-core-3.2.0-1 fuse-libs-2.7.4-8.el5 glusterfs-fuse-3.2.0-1 fuse-libs-2.7.4-8.el5 Errors: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. index.cgi D ffff810001c7eaa0 0 14772 26749 14840 (NOTLB) ffff81002b3efc08 0000000000000082 ffff81005e929920 ffffffff885121ac ffff81001772fd00 0000000000000009 ffff810013180100 ffff810001fd30c0 0013494028f4652a 00000000001826e3 ffff8100131802e8 00000001000d597a Call Trace: [<ffffffff885121ac>] :fuse:request_send_nowait+0x56/0x78 [<ffffffff8006e1d7>] do_gettimeofday+0x40/0x90 [<ffffffff80028b44>] sync_page+0x0/0x43 [<ffffffff800637ea>] io_schedule+0x3f/0x67 [<ffffffff80028b82>] sync_page+0x3e/0x43 [<ffffffff8006392e>] __wait_on_bit_lock+0x36/0x66 [<ffffffff8003fce0>] __lock_page+0x5e/0x64 [<ffffffff800a0b8d>] wake_bit_function+0x0/0x23 [<ffffffff8000c373>] do_generic_mapping_read+0x1df/0x359 [<ffffffff8000d18c>] file_read_actor+0x0/0x159 [<ffffffff8000c639>] __generic_file_aio_read+0x14c/0x198 [<ffffffff800c6964>] generic_file_read+0xac/0xc5 [<ffffffff800a0b5f>] autoremove_wake_function+0x0/0x2e [<ffffffff80062ff8>] thread_return+0x62/0xfe [<ffffffff8000b729>] vfs_read+0xcb/0x171 [<ffffffff80011c15>] sys_read+0x45/0x6e [<ffffffff8005d116>] system_call+0x7e/0x83 -Tony --------------------------- Manager, IT Operations Format Dynamics, Inc. P: 303-228-7327 F: 303-228-7305 abiacco at formatdynamics.com http://www.formatdynamics.com