Re: Need some help on Mismatching xdata / Failed combine iatt / Too many fd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ashish,

There was heavy IO load on the cluster when it get locked down. I fear the process waiting for IO will all get crashed.

Furthermore, both "start force" and "stop" told me "Error : Request timed out". I'm not sure if it was caused by the semi-dead node. I'll hard reset the node tomorrow and see if it helps.

Besides, what caused the lock and how can I avoid it? Any advice is appreciated.

Best wishes,
Chen

On 4/4/2016 6:11 PM, Ashish Pandey wrote:
Hi Chen,

As I suspected, there are many blocked call for inodelk in sm11/mnt-disk1-mainvol.31115.dump.1459760675.

=============================================
[xlator.features.locks.mainvol-locks.inode]
path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
mandatory=0
inodelk-count=4
lock-dump.domain.domain=mainvol-disperse-0:self-heal
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41
inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09
=============================================

This could be the cause of hang.
Possible Workaround -
If there is no IO going on for this volume, we can restart the volume using - gluster v start <volume-name> force. This will restart the nfs process too which will release the locks and
we could come out of this issue.

Ashish




----- Original Message -----
From: "Chen Chen" <chenchen@xxxxxxxxxxxxxxxx>
To: "Ashish Pandey" <aspandey@xxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Sent: Monday, April 4, 2016 2:56:37 PM
Subject: Re:  Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Hi Ashish,

Yes, I only uploaded the directory of one node (sm11). All nodes are
showing the same kind of errors at the same time more or less.

I'm sending the infos of the other 5 nodes. Logs of all bricks (except
the "dead" 1x2) are also appended. One of the node (sm16) refused to let
me ssh into it. volume status said it is still alive and showmount on it
is working too.

The node "hw10" works as a pure NFS server and don't have any bricks.

The dump file and logs are again in my Dropbox (3.8M)
https://dl.dropboxusercontent.com/u/56671522/statedump.tar.xz

Best wishes,
Chen

On 4/4/2016 4:27 PM, Ashish Pandey wrote:

Hi Chen,

By looking at log in mnt-disk1-mainvol.log and mnt-disk1-mainvol.log I suspect this hang is because of inode lock contention.
I think the log provided are for one brick only.
To make sure of it, we would require statedump for all the brick process and nfs

For bricks: gluster volume statedump <volname>
For nfs server: gluster volume statedump <volname> nfs

Directory where statedump files are created can be find by using 'gluster --print-statedumpdir' command.
If not present create this directory.

Logs for all the bricks are also required.
You should try to restart the volume which could solve this hang issue if this is because of inode lock.

gluster volume start <volname> force

Ashish

--
Chen Chen
上海慧算生物技术有限公司
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Room 410, 781 Cai Lun Road, China (Shanghai) Pilot Free Trade Zone
        Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen@xxxxxxxxxxxxxxxx
Web: www.smartquerier.com

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux