Hi Ashish, Thank you for your quick response!Yes, the volume became unresponsive in 3 minutes after I initiate the rsync. Nodes go down one by one. The cluster monitor is showing me the whole procedure.
At first the node which was mounted by my rsync client (sm15) boosted to huge load (55, was <10 before). Then in a few seconds, another node's network I/O dropped to ~zero, then the 3rd, 4th, all went down one by one.
On all NFS clients, "strace ls /data" (my mount point) stucked at "stat("/data",".
The locked node is not reachable by ssh now, but peer status said it is connected, and volume status reports its NFS and bricks are online. My cluster monitor daemon is also alive.
"gluster volume start <volname> force" reports timeout. "showmount -e <nodename>" on other nodes works except the locked node. If I force shutdown the locked node or unplug its 10Gb cable, the volume will return to work in no time.
statedump showed these lines (as you have noted out before): ====================== [xlator.features.locks.mainvol-locks.inode] path=/home/analyzer/workdir/NTD/bam/A1703.bam mandatory=0 inodelk-count=11 lock-dump.domain.domain=mainvol-disperse-0:self-heal lock-dump.domain.domain=mainvol-disperse-0inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=3cc5a0e4627f0000, client=0x7f03b8082150, connection-id=hw10-48926-2016/04/13-07:23:01:395332-mainvol-client-0-0, blocked at 2016-04-13 08:30:11, granted at 2016-04-13 08:31:09 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=80894744477f0000, client=0x7f03b80a8ef0, connection-id=sm12-4956-2016/04/13-07:22:44:529032-mainvol-client-0-0, blocked at 2016-04-13 08:31:09 inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=1827fddb617f0000, client=0x7f03b80860a0, connection-id=sm16-4859-2016/04/13-07:22:42:791688-mainvol-client-0-0, blocked at 2016-04-13 08:31:09 inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=6c20fddb617f0000, client=0x7f03b80860a0, connection-id=sm16-4859-2016/04/13-07:22:42:791688-mainvol-client-0-0, blocked at 2016-04-13 08:31:09
... ======================Since the performance of native fuse client is undesirable, I used a round-robin DNS policy to distribute load over the cluster and provide fail-over. Clients who is requesting mount will be resolved into different node, so the load could got averaged (NFS mount do cause heavy memory footprint on the server, right?). The client will tied to the specific node until it umount the share.
According to statedump, the inode locks were always granted to one node and other nodes' requests get blocked. I was thinking, maybe the decentralized NFS server cluster caused this race situation?
Then here's the volume info. I tweaked a lot, trying to boost its performance while keeping it stable. I have encountered inode lock 4 times since I sent the first E-mail in this thread asking for help.
====================== Volume Name: mainvol Type: Distributed-Disperse Volume ID: 2e190c59-9e28-43a5-b22a-24f75e9a580b Status: Started Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: sm11:/mnt/disk1/mainvol Brick2: sm12:/mnt/disk1/mainvol Brick3: sm13:/mnt/disk1/mainvol Brick4: sm14:/mnt/disk2/mainvol Brick5: sm15:/mnt/disk2/mainvol Brick6: sm16:/mnt/disk2/mainvol Brick7: sm11:/mnt/disk2/mainvol Brick8: sm12:/mnt/disk2/mainvol Brick9: sm13:/mnt/disk2/mainvol Brick10: sm14:/mnt/disk1/mainvol Brick11: sm15:/mnt/disk1/mainvol Brick12: sm16:/mnt/disk1/mainvol Options Reconfigured: performance.nfs.quick-read: on performance.nfs.io-cache: on performance.nfs.io-threads: on performance.client-io-threads: on performance.nfs.read-ahead: on performance.nfs.write-behind-window-size: 4MB performance.nfs.stat-prefetch: on performance.stat-prefetch: on nfs.acl: off features.lock-heal: on features.grace-timeout: 120 server.outstanding-rpc-limit: 128 network.remote-dio: on performance.io-cache: true performance.readdir-ahead: on auth.allow: 172.16.135.* performance.cache-size: 16GB client.event-threads: 8 server.event-threads: 8 performance.io-thread-count: 32 performance.write-behind-window-size: 4MB diagnostics.client-log-level: WARNING diagnostics.brick-log-level: WARNING cluster.lookup-optimize: on cluster.readdir-optimize: on nfs.rpc-auth-allow: 172.168.135.*,127.0.0.1,::1 ======================The locked node cannot be reached now. statedump [nfs] of other nodes were attached. I also attached the /var/log/gluster from one node (sm11) as a representative. The attachment is too big for the mailing list. It is available at "https://dl.dropboxusercontent.com/u/56671522/inodelock.tar.xz";.
Best wishes, Chen On 4/13/2016 6:29 PM, Ashish Pandey wrote:
Hi Chen, What do you mean by "instantly get inode locked and teared down the whole cluster" ? Do you mean that whole disperse volume became unresponsive? I don't have much idea about features.lock-heal so can't comment how can it help you. Could you please explain second part of your mail? What exactly are you trying to do and what is the setup? Also volume info, logs statedumps might help. ----- Ashish ------------------------------------------------------------------------ *From: *"Chen Chen" <chenchen@xxxxxxxxxxxxxxxx> *To: *"Ashish Pandey" <aspandey@xxxxxxxxxx> *Cc: *gluster-users@xxxxxxxxxxx *Sent: *Wednesday, April 13, 2016 3:26:53 PM *Subject: *Re: Need some help on Mismatching xdata / Failed combine iatt / Too many fd Hi Ashish and other Gluster Users, When I put some heavy IO load onto my cluster (a rsync operation, ~600MB/s), one of the node instantly get inode locked and teared down the whole cluster. I've already turned on "features.lock-heal" but it didn't help. My clients is using a round-robin tactic to mount servers, hoping to average the pressure. Could it be caused by a race between NFS servers on different nodes? Should I instead create a dedicated NFS Server with huge memory, no brick, and multiple Ethernet cables? I really appreciate any help from you guys. Best wishes, Chen PS. Don't know why the native fuse client is 5 times inferior than the old good NFSv3. On 4/4/2016 6:11 PM, Ashish Pandey wrote:Hi Chen, As I suspected, there are many blocked call for inodelk insm11/mnt-disk1-mainvol.31115.dump.1459760675.============================================= [xlator.features.locks.mainvol-locks.inode] path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar mandatory=0 inodelk-count=4 lock-dump.domain.domain=mainvol-disperse-0:self-heal lock-dump.domain.domain=mainvol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid= 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid= 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid= 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09============================================= This could be the cause of hang. Possible Workaround - If there is no IO going on for this volume, we can restart the volumeusing - gluster v start <volume-name> force. This will restart the nfs process too which will release the locks andwe could come out of this issue. Ashish
-- Chen Chen Shanghai SmartQuerier Biotechnology Co., Ltd. Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China Mob: +86 15221885893 Email: chenchen@xxxxxxxxxxxxxxxx Web: www.smartquerier.com
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users