Hi Folks, last week I upgraded gluster from 3.7.11 to 3.7.12. Unfortunately I had two times the problem of a task being blocked for more than 120 seconds leaving me with nothing more than a hard reset of the node! kernel: [1705322.676270] INFO: task apache2:3092 blocked for more than 120 seconds. kernel: [1705322.682202] Not tainted 3.13.0-88-generic #135-Ubuntu kernel: [1705322.682770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [1705322.683738] apache2 D ffff880a03d93180 0 3092 1800 0x00000000 kernel: [1705322.683761] ffff88016f865b38 0000000000000082 ffff8809cbde8000 0000000000013180 kernel: [1705322.683769] ffff88016f865fd8 0000000000013180 ffff8809cbde8000 ffff880a03d93a18 kernel: [1705322.683775] ffff880a03f92d90 0000000000000002 ffffffffa029e670 ffff88016f865bb0 kernel: [1705322.683782] Call Trace: kernel: [1705322.683978] [<ffffffffa029e670>] ? nfs_free_request+0xb0/0xb0 [nfs] kernel: [1705322.684043] [<ffffffff8172e10d>] io_schedule+0x9d/0x130 kernel: [1705322.684096] [<ffffffffa029e67e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] kernel: [1705322.684103] [<ffffffff8172e582>] __wait_on_bit+0x62/0x90 kernel: [1705322.684162] [<ffffffff81617018>] ? sk_reset_timer+0x18/0x30 kernel: [1705322.684193] [<ffffffffa029e670>] ? nfs_free_request+0xb0/0xb0 [nfs] kernel: [1705322.684203] [<ffffffff8172e627>] out_of_line_wait_on_bit+0x77/0x90 kernel: [1705322.684234] [<ffffffff810adac0>] ? autoremove_wake_function+0x40/0x40 kernel: [1705322.684258] [<ffffffffa029ea33>] nfs_wait_on_request+0x33/0x40 [nfs] kernel: [1705322.684276] [<ffffffffa02a3a40>] nfs_updatepage+0x150/0x660 [nfs] kernel: [1705322.684290] [<ffffffffa0294dfb>] nfs_write_end+0x5b/0x340 [nfs] kernel: [1705322.684318] [<ffffffff811522da>] generic_file_buffered_write+0x16a/0x250 kernel: [1705322.684336] [<ffffffff81153991>] __generic_file_aio_write+0x1c1/0x3d0 kernel: [1705322.684340] [<ffffffff81153bf8>] generic_file_aio_write+0x58/0xa0 kernel: [1705322.684354] [<ffffffffa029406b>] nfs_file_write+0xbb/0x1d0 [nfs] kernel: [1705322.684387] [<ffffffff811c096a>] do_sync_write+0x5a/0x90 kernel: [1705322.684394] [<ffffffff811c10f4>] vfs_write+0xb4/0x1f0 kernel: [1705322.684399] [<ffffffff811c1b29>] SyS_write+0x49/0xa0 kernel: [1705322.684411] [<ffffffff8173a4dd>] system_call_fastpath+0x1a/0x1f Some research came up with this links: * https://www.gluster.org/pipermail/gluster-users/2016-July/027327.html * http://serverfault.com/questions/500222/kernel-3-8-apache2-with-wsgi-info-task-apache2-blocked-for-more-than-120-sec * https://www.novell.com/support/kb/doc.php?id=7010287l The gluster volume serves a home directory for apache/php-fpm and usually the server is quite busy in terms of requests. As with 3.7.11 I did not have any problems the last few weeks, I am unsure if it is related to the 3.7.11 -> 3.7.12 upgrade. (or is just the file system blocking?) My setup is: # dpkg -l | grep gluster ii glusterfs-client 3.7.12-ubuntu1~trusty1 amd64 clustered file-system (client package) ii glusterfs-common 3.7.12-ubuntu1~trusty1 amd64 GlusterFS common libraries and translator modules ii glusterfs-server 3.7.12-ubuntu1~trusty1 amd64 clustered file-system (server package) The gluster Volume is mounted on the same host as the volume: XXX:/gldata on /home/gldata type nfs (rw,nfsvers=3,addr=XXX,_netdev) Are there any known race conditions with 3.7.12 and NFS? Should I apply the vm.dirty_ratio settings mentioned in the links above? Should I downgrade to 3.7.11 or upgrade to 3.7.13? I appreciate any help, THX Georg |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users