gluster brick hang/High CPU load after 10 hours file transfer test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


I encountered gluster hung after 10 hours file transfer test.


gluster3.7.14 nfs-ganesha 2.3.2


we are running on 56-cores superMicro PC.


>sudo system-docker stats gluster nfs

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O
gluster                  2694.74%            2.434 GB / 270.4 GB   0.90%               0 B / 0 B           0 B / 1.073 MB
nfs                          30.07%              146.6 MB / 270.4 GB   0.05%               0 B / 0 B           4.096 kB / 0 B


>top capture:

root     S    2556m   0%   24% /usr/local/sbin/glusterfsd -s denali-bm-qa-45 --volfile-id gluster-volume



gdb attach to some glusterfsd thread. it reported:


#0  pthread_spin_lock () at ../sysdeps/x86_64/nptl/pthread_spin_lock.S:32
#1  0x00007f945f379ae5 in pl_inode_get (this=this@entry=0x7f9460010720, inode=inode@entry=0x7f943ffe1edc) at common.c:416
#2  0x00007f945f3883be in pl_common_inodelk (frame=0x7f9467dc2ed8, this=0x7f9460010720, volume=0x7f945b5a9ac0 "gluster-volume-disperse-0", inode=0x7f943ffe1edc, cmd=6, flock=0x7f94678653d8, loc=0x7f94678652d8, fd=0x0,
    xdata=0x7f946a2e9180) at inodelk.c:743
#3  0x00007f945f388e27 in pl_inodelk (frame=<optimized out>, this=<optimized out>, volume=<optimized out>, loc=<optimized out>, cmd=<optimized out>, flock=<optimized out>, xdata=0x7f946a2e9180) at inodelk.c:816
#4  0x00007f946a00b5c6 in default_inodelk (frame=0x7f9467dc2ed8, this=0x7f9460011bf0, volume=0x7f945b5a9ac0 "gluster-volume-disperse-0", loc=0x7f94678652d8, cmd=6, lock=0x7f94678653d8, xdata=0x7f946a2e9180) at defaults.c:2032
#5  0x00007f946a01e324 in default_inodelk_resume (frame=0x7f9467dbabd4, this=0x7f9460013070, volume=0x7f945b5a9ac0 "gluster-volume-disperse-0", loc=0x7f94678652d8, cmd=6, lock=0x7f94678653d8, xdata=0x7f946a2e9180) at defaults.c:1589
#6  0x00007f946a03c1ce in call_resume_wind (stub=<optimized out>) at call-stub.c:2210
#7  0x00007f946a03c5bd in call_resume (stub=0x7f9467865298) at call-stub.c:2576
#8  0x00007f945ef5b2b2 in iot_worker (data="" at io-threads.c:215
#9  0x00007f946979270a in start_thread (arg=0x7f943cd5e700) at pthread_create.c:333
#10 0x00007f94694c882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

It shows that  many glusterfsd sub-threads' pthread_spin_lock wait for unlock. it caused CPU load so high.
           |-glusterfsd(772)-+-{glusterfsd}(773)
           |                 |-{glusterfsd}(774)
           |                 |-{glusterfsd}(775)
           |                 |-{glusterfsd}(776)
           |                 |-{glusterfsd}(777)
           |                 |-{glusterfsd}(778)
           |                 |-{glusterfsd}(779)
           |                 |-{glusterfsd}(780)
           |                 |-{glusterfsd}(781)
           |                 |-{glusterfsd}(782)
           |                 |-{glusterfsd}(783)
           |                 |-{glusterfsd}(784)
           |                 |-{glusterfsd}(785)
           |                 |-{glusterfsd}(786)
           |                 |-{glusterfsd}(787)
           |                 |-{glusterfsd}(788)
           |                 `-{glusterfsd}(789)
           |-glusterfsd(791)-+-{glusterfsd}(792)
           |                 |-{glusterfsd}(793)
           |                 |-{glusterfsd}(794)
           |                 |-{glusterfsd}(795)
           |                 |-{glusterfsd}(796)
           |                 |-{glusterfsd}(797)
           |                 |-{glusterfsd}(798)
           |                 |-{glusterfsd}(799)
           |                 |-{glusterfsd}(800)
           |                 |-{glusterfsd}(801)
           |                 |-{glusterfsd}(802)
           |                 |-{glusterfsd}(803)
           |                 |-{glusterfsd}(804)
           |                 |-{glusterfsd}(805)
           |                 |-{glusterfsd}(806)
           |                 |-{glusterfsd}(807)
           |                 `-{glusterfsd}(808)



If just wait for few hours, the system will recover to normal.


I am wondering how to go deeply to discover what caused one of the thread hold the lock so long. Please give me your professional advice.


Best Regards!


James Zhu

Email Disclaimer & Confidentiality Notice

This message is confidential and intended solely for the use of the recipient to whom they are addressed. If you are not the intended recipient you should not deliver, distribute or copy this e-mail. Please notify the sender immediately by e-mail and delete this e-mail from your system. Copyright © 2016 by Istuary Innovation Labs, Inc. All rights reserved. 

 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux