Re: Need some help on Mismatching xdata / Failed combine iatt / Too many fd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chen,

By looking at log in mnt-disk1-mainvol.log and mnt-disk1-mainvol.log I suspect this hang is because of inode lock contention.
I think the log provided are for one brick only. 
To make sure of it, we would require statedump for all the brick process and nfs

For bricks: gluster volume statedump <volname>
For nfs server: gluster volume statedump <volname> nfs

Directory where statedump files are created can be find by using 'gluster --print-statedumpdir' command.
If not present create this directory.

Logs for all the bricks are also required. 
You should try to restart the volume which could solve this hang issue if this is because of inode lock.

gluster volume start <volname> force

Ashish






----- Original Message -----
From: "Chen Chen" <chenchen@xxxxxxxxxxxxxxxx>
To: "Ashish Pandey" <aspandey@xxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Sent: Sunday, April 3, 2016 2:13:22 PM
Subject: Re:  Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Hi Ashish Pandey,

After some investigation I updated the server from 3.7.6 to 3.7.9. I 
also switched from native fuse to NFS mount (which boosted the 
performance a lot when I tested) on April 1st.

Then after two days' running, the cluster appeared to be locked. "ls" 
hangs, no network usage, volume profile showed no r/w activity on 
bricks. "dmesg" showed the NFS went dead in 12 hrs (Apr 2 01:13), but 
"showmount" and "volume status" said NFS server is responding and all 
bricks are alive.

I'm not sure what had happened (glustershd.log and nfs.log didn't show 
anything interesting), so I dumped the whole log folder instead. It was 
a bit too large (5MB, filled by Error and Warning) and my mail was 
rejected multiple times by the mailing list. I can only attached the 
snapshot of all logs. You can grab the full version at 
https://dl.dropboxusercontent.com/u/56671522/glusterfs.tar.xz instead.

The volume profile info is also attached. Hope it helps.

Best wishes,
Chen

On 3/27/2016 2:38 AM, Ashish Pandey wrote:
> Hi Chen,
>
> Could you please send us following logs-
> 1 - brick logs - under /var/log/messages/brick/
> 2 - mount logs
>
> Also some information like what kind of IO was happening (read,write, unlink, rename on different mount) to understand this issue in a better way.
>
> ---
> Ashish
>
> ----- Original Message -----
> From: "陈陈" <chenchen@xxxxxxxxxxxxxxxx>
> To: gluster-users@xxxxxxxxxxx
> Sent: Friday, March 25, 2016 8:59:04 AM
> Subject:  Need some help on Mismatching xdata / Failed combine iatt / Too many fd
>
> Hi Everyone,
>
> I have a "2 x (4 + 2) = 12 Distributed-Disperse" volume. After upgraded
> to 3.7.8 I noticed the volume is frequently out of service. The
> glustershd.log is flooded by:
>
> [ec-combine.c:866:ec_combine_check] 0-mainvol-disperse-1: Mismatching
> xdata in answers of 'LOOKUP'"
> [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed
> on some subvolumes (up=3F, mask=3F, remaining=0, good=1E, bad=21)
> [ec-common.c:71:ec_heal_report] 0-mainvol-disperse-1: Heal failed
> [Invalid argument]
> [ec-combine.c:206:ec_iatt_combine] 0-mainvol-disperse-0: Failed to
> combine iatt (inode: xxx, links: 1-1, uid: 1000-1000, gid: 1000-1000,
> rdev: 0-0, size: xxx-xxx, mode: 100600-100600)
>
> in normal working state, and sometimes 1000+ lines of:
>
> [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-7: remote
> operation failed. Path: <gfid:xxxx> (xxxx) [Too many open files]
>
> and the brick went offline. "top open" showed "Max open fds: 899195".
>
> Can anyone suggest me what happened, and what should I do? I was trying
> to deal with the terrible IOPS problem but things got even worse.
>
> Each Server has 2 x E5-2630v3 (32threads/server), 32GB RAM. Additional
> infos are in the attachements. Many thanks.
>
> Sincerely yours,
> Chen
>

-- 
Chen Chen
上海慧算生物技术有限公司
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Room 410, 781 Cai Lun Road, China (Shanghai) Pilot Free Trade Zone
         Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen@xxxxxxxxxxxxxxxx
Web: www.smartquerier.com

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux