Hi Ashish and other Gluster Users,When I put some heavy IO load onto my cluster (a rsync operation, ~600MB/s), one of the node instantly get inode locked and teared down the whole cluster. I've already turned on "features.lock-heal" but it didn't help.
My clients is using a round-robin tactic to mount servers, hoping to average the pressure. Could it be caused by a race between NFS servers on different nodes? Should I instead create a dedicated NFS Server with huge memory, no brick, and multiple Ethernet cables?
I really appreciate any help from you guys. Best wishes, ChenPS. Don't know why the native fuse client is 5 times inferior than the old good NFSv3.
On 4/4/2016 6:11 PM, Ashish Pandey wrote:
Hi Chen, As I suspected, there are many blocked call for inodelk in sm11/mnt-disk1-mainvol.31115.dump.1459760675. ============================================= [xlator.features.locks.mainvol-locks.inode] path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar mandatory=0 inodelk-count=4 lock-dump.domain.domain=mainvol-disperse-0:self-heal lock-dump.domain.domain=mainvol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51 inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41 inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09 ============================================= This could be the cause of hang. Possible Workaround - If there is no IO going on for this volume, we can restart the volume using - gluster v start <volume-name> force. This will restart the nfs process too which will release the locks and we could come out of this issue. Ashish
-- Chen Chen Shanghai SmartQuerier Biotechnology Co., Ltd. Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China Mob: +86 15221885893 Email: chenchen@xxxxxxxxxxxxxxxx Web: www.smartquerier.com
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users