Re: Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Chen Chen <chenchen@xxxxxxxxxxxxxxxx> · Wed, 13 Apr 2016 17:56:53 +0800

Hi Ashish and other Gluster Users,

When I put some heavy IO load onto my cluster (a rsync operation, 
~600MB/s), one of the node instantly get inode locked and teared down 
the whole cluster. I've already turned on "features.lock-heal" but it 
didn't help.

My clients is using a round-robin tactic to mount servers, hoping to 
average the pressure. Could it be caused by a race between NFS servers 
on different nodes? Should I instead create a dedicated NFS Server with 
huge memory, no brick, and multiple Ethernet cables?

I really appreciate any help from you guys.

Best wishes,
Chen

PS. Don't know why the native fuse client is 5 times inferior than the 
old good NFSv3.

On 4/4/2016 6:11 PM, Ashish Pandey wrote:
Hi Chen,

As I suspected, there are many blocked call for inodelk in sm11/mnt-disk1-mainvol.31115.dump.1459760675.

=============================================
[xlator.features.locks.mainvol-locks.inode]
path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
mandatory=0
inodelk-count=4
lock-dump.domain.domain=mainvol-disperse-0:self-heal
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41
inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09
=============================================

This could be the cause of hang.
Possible Workaround -
If there is no IO going on for this volume, we can restart the volume using - gluster v start <volume-name> force. This will restart the nfs process too which will release the locks and
we could come out of this issue.

Ashish

--
Chen Chen
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen@xxxxxxxxxxxxxxxx
Web: www.smartquerier.com

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users