NFS blocked for more than 120 seconds on gluster 3.7.12

Georg Schoenberger <g.schoenberger@xxxxxxxxxx> · Mon, 11 Jul 2016 16:13:23 +0000



    Hi Folks,

    
    last week I upgraded gluster from 3.7.11 to 3.7.12. Unfortunately I
    had two times the problem

    of a task being blocked for more than 120 seconds leaving me with
    nothing more than a hard reset of the node!

    
    kernel: [1705322.676270] INFO: task apache2:3092 blocked for more
    than 120 seconds.

    kernel: [1705322.682202]       Not tainted 3.13.0-88-generic
    #135-Ubuntu

    kernel: [1705322.682770] "echo 0 >
    /proc/sys/kernel/hung_task_timeout_secs" disables this message.

    kernel: [1705322.683738] apache2         D ffff880a03d93180     0 
    3092   1800 0x00000000

    kernel: [1705322.683761]  ffff88016f865b38 0000000000000082
    ffff8809cbde8000 0000000000013180

    kernel: [1705322.683769]  ffff88016f865fd8 0000000000013180
    ffff8809cbde8000 ffff880a03d93a18

    kernel: [1705322.683775]  ffff880a03f92d90 0000000000000002
    ffffffffa029e670 ffff88016f865bb0

    kernel: [1705322.683782] Call Trace:

    kernel: [1705322.683978]  [<ffffffffa029e670>] ?
    nfs_free_request+0xb0/0xb0 [nfs]

    kernel: [1705322.684043]  [<ffffffff8172e10d>]
    io_schedule+0x9d/0x130

    kernel: [1705322.684096]  [<ffffffffa029e67e>]
    nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]

    kernel: [1705322.684103]  [<ffffffff8172e582>]
    __wait_on_bit+0x62/0x90

    kernel: [1705322.684162]  [<ffffffff81617018>] ?
    sk_reset_timer+0x18/0x30

    kernel: [1705322.684193]  [<ffffffffa029e670>] ?
    nfs_free_request+0xb0/0xb0 [nfs]

    kernel: [1705322.684203]  [<ffffffff8172e627>]
    out_of_line_wait_on_bit+0x77/0x90

    kernel: [1705322.684234]  [<ffffffff810adac0>] ?
    autoremove_wake_function+0x40/0x40

    kernel: [1705322.684258]  [<ffffffffa029ea33>]
    nfs_wait_on_request+0x33/0x40 [nfs]

    kernel: [1705322.684276]  [<ffffffffa02a3a40>]
    nfs_updatepage+0x150/0x660 [nfs]

    kernel: [1705322.684290]  [<ffffffffa0294dfb>]
    nfs_write_end+0x5b/0x340 [nfs]

    kernel: [1705322.684318]  [<ffffffff811522da>]
    generic_file_buffered_write+0x16a/0x250

    kernel: [1705322.684336]  [<ffffffff81153991>]
    __generic_file_aio_write+0x1c1/0x3d0

    kernel: [1705322.684340]  [<ffffffff81153bf8>]
    generic_file_aio_write+0x58/0xa0

    kernel: [1705322.684354]  [<ffffffffa029406b>]
    nfs_file_write+0xbb/0x1d0 [nfs]

    kernel: [1705322.684387]  [<ffffffff811c096a>]
    do_sync_write+0x5a/0x90

    kernel: [1705322.684394]  [<ffffffff811c10f4>]
    vfs_write+0xb4/0x1f0

    kernel: [1705322.684399]  [<ffffffff811c1b29>]
    SyS_write+0x49/0xa0

    kernel: [1705322.684411]  [<ffffffff8173a4dd>]
    system_call_fastpath+0x1a/0x1f

    
    Some research came up with this links:

    *
    https://www.gluster.org/pipermail/gluster-users/2016-July/027327.html

    
    *
http://serverfault.com/questions/500222/kernel-3-8-apache2-with-wsgi-info-task-apache2-blocked-for-more-than-120-sec

    * https://www.novell.com/support/kb/doc.php?id=7010287l

    
    The gluster volume serves a home directory for apache/php-fpm and
    usually the server is quite busy in terms of requests.

    As with 3.7.11 I did not have any problems the last few weeks, I am
    unsure if it is related to the 3.7.11 -> 3.7.12 upgrade.

    (or is just the file system blocking?)

    
    My setup is:

    # dpkg -l | grep gluster

    ii  glusterfs-client                    
    3.7.12-ubuntu1~trusty1                               amd64       
    clustered file-system (client package)

    ii  glusterfs-common                    
    3.7.12-ubuntu1~trusty1                               amd64       
    GlusterFS common libraries and translator modules

    ii  glusterfs-server                    
    3.7.12-ubuntu1~trusty1                               amd64       
    clustered file-system (server package)

    The gluster Volume is mounted on the same host as the volume:

    XXX:/gldata on /home/gldata type nfs (rw,nfsvers=3,addr=XXX,_netdev)

    
    Are there any known race conditions with 3.7.12 and NFS?

    Should I apply the
    
    vm.dirty_ratio
      settings mentioned in the links above?

    Should I downgrade to 3.7.11 or upgrade to 3.7.13?

    
    I appreciate any help,

    THX Georg

    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users