Re: glusterfs3.2.7 split brain on a server, while it's normal on another server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Pranith, thank you very much  for your reply.

 

The xattrs of file-in-split-brain on three disks are same. I have confirm it when I find this error.

 

[root@bj-nx-cip-w86 000]# getfattr -d -m . -e hex 095

# file: 095

trusted.afr.gfs1-client-15=0x000000000000000000000000

trusted.afr.gfs1-client-16=0x000000000000000000000000

trusted.afr.gfs1-client-17=0x000000000000000000000000

trusted.gfid=0x5ca8d51e5ea24405a8f5710b9aba08cc

 

[root@bj-nx-cip-w76 000]# getfattr -d -m . -e hex 095

# file: 095

trusted.afr.gfs1-client-15=0x000000000000000000000000

trusted.afr.gfs1-client-16=0x000000000000000000000000

trusted.afr.gfs1-client-17=0x000000000000000000000000

trusted.gfid=0x5ca8d51e5ea24405a8f5710b9aba08cc

 

[root@bj-nx-cip-w66 000]# getfattr -d -m . -e hex 095

# file: 095

trusted.afr.gfs1-client-15=0x000000000000000000000000

trusted.afr.gfs1-client-16=0x000000000000000000000000

trusted.afr.gfs1-client-17=0x000000000000000000000000

trusted.gfid=0x5ca8d51e5ea24405a8f5710b9aba08cc

 

I think the glusterfs client maybe cache some information. Because I umount it, then mount it, the error is not happened.

 

 

 

From: Pranith Kumar K [mailto:pkarampu@xxxxxxxxxx]
Sent: Wednesday, January 09, 2013 6:06 PM
To: Song
Cc: gluster-devel@xxxxxxxxxx
Subject: Re: glusterfs3.2.7 split brain on a server, while it's normal on another server

 

On 01/09/2013 11:03 AM, Song wrote:

Hi,

 

We have a glusterfs clusters, version is 3.2.7. The volume info is as below:

 

Volume Name: gfs1

Type: Distributed-Replicate

Status: Started

Number of Bricks: 94 x 3 = 282

Transport-type: tcp

 

 

We native mount the volume in all cluster servers. When we access the file “/XMTEXT/gfs1_000/000/000/095” on one server, the error is split brain.

While we can access the same file on another server.

At the same time, after re-mount the volume at error server, access the same file is ok.

The glusterfs has cached some information? This case has happened more than one.

 

The log is as following when split brain.

 

[2013-01-07 09:57:29.554505] W [afr-common.c:931:afr_detect_self_heal_by_lookup_status] 0-gfs1-replicate-5: split brain detected during lookup of /XMTEXT/gfs1_000/000/000/095.

[2013-01-07 09:57:29.554566] I [afr-common.c:1039:afr_launch_self_heal] 0-gfs1-replicate-5: background  data gfid self-heal triggered. path: /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:29.555299] I [afr-self-heal-common.c:1290:sh_missing_entries_create] 0-gfs1-replicate-5: no missing files - /XMTEXT/gfs1_000/000/000/095. proceeding to metadata check

[2013-01-07 09:57:29.555507] I [afr-self-heal-common.c:1050:afr_sh_missing_entries_done] 0-gfs1-replicate-5: split brain found, aborting selfheal of /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:29.555531] E [afr-self-heal-common.c:2190:afr_self_heal_completion_cbk] 0-gfs1-replicate-5: background  data gfid self-heal failed on /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:35.598229] W [afr-common.c:931:afr_detect_self_heal_by_lookup_status] 0-gfs1-replicate-5: split brain detected during lookup of /XMTEXT/gfs1_000/000/000/095.

[2013-01-07 09:57:35.598282] I [afr-common.c:1039:afr_launch_self_heal] 0-gfs1-replicate-5: background  data gfid self-heal triggered. path: /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:35.598939] I [afr-self-heal-common.c:1290:sh_missing_entries_create] 0-gfs1-replicate-5: no missing files - /XMTEXT/gfs1_000/000/000/095. proceeding to metadata check

[2013-01-07 09:57:35.599139] I [afr-self-heal-common.c:1050:afr_sh_missing_entries_done] 0-gfs1-replicate-5: split brain found, aborting selfheal of /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:35.599176] E [afr-self-heal-common.c:2190:afr_self_heal_completion_cbk] 0-gfs1-replicate-5: background  data gfid self-heal failed on /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:38.192819] W [afr-common.c:931:afr_detect_self_heal_by_lookup_status] 0-gfs1-replicate-5: split brain detected during lookup of /XMTEXT/gfs1_000/000/000/095.

[2013-01-07 09:57:38.192875] I [afr-common.c:1039:afr_launch_self_heal] 0-gfs1-replicate-5: background  data gfid self-heal triggered. path: /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:38.193486] I [afr-self-heal-common.c:1290:sh_missing_entries_create] 0-gfs1-replicate-5: no missing files - /XMTEXT/gfs1_000/000/000/095. proceeding to metadata check

[2013-01-07 09:57:38.193708] I [afr-self-heal-common.c:1050:afr_sh_missing_entries_done] 0-gfs1-replicate-5: split brain found, aborting selfheal of /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:38.193731] E [afr-self-heal-common.c:2190:afr_self_heal_completion_cbk] 0-gfs1-replicate-5: background  data gfid self-heal failed on /XMTEXT/gfs1_000/000/000/095

[2013-01-07 09:57:38.193937] W [afr-open.c:168:afr_open] 0-gfs1-replicate-5: failed to open as split brain seen, returning EIO

[2013-01-07 09:57:38.194033] W [fuse-bridge.c:693:fuse_fd_cbk] 0-glusterfs-fuse: 3162527: OPEN() /XMTEXT/gfs1_000/000/000/095 => -1 (Input/output error)

[2013-01-07 10:08:12.569821] W [afr-common.c:931:afr_detect_self_heal_by_lookup_status] 0-gfs1-replicate-5: split brain detected during lookup of /XMTEXT/gfs1_000/000/000/095.

[2013-01-07 10:08:12.569891] I [afr-common.c:1039:afr_launch_self_heal] 0-gfs1-replicate-5: background  data gfid self-heal triggered. path: /XMTEXT/gfs1_000/000/000/095

[2013-01-07 10:08:12.571538] I [afr-self-heal-common.c:1290:sh_missing_entries_create] 0-gfs1-replicate-5: no missing files - /XMTEXT/gfs1_000/000/000/095. proceeding to metadata check

[2013-01-07 10:08:12.572684] I [afr-self-heal-common.c:1050:afr_sh_missing_entries_done] 0-gfs1-replicate-5: split brain found, aborting selfheal of /XMTEXT/gfs1_000/000/000/095

[2013-01-07 10:08:12.572732] E [afr-self-heal-common.c:2190:afr_self_heal_completion_cbk] 0-gfs1-replicate-5: background  data gfid self-heal failed on /XMTEXT/gfs1_000/000/000/095

[2013-01-07 10:08:12.580006] W [afr-open.c:168:afr_open] 0-gfs1-replicate-5: failed to open as split brain seen, returning EIO

[2013-01-07 10:08:12.580103] W [fuse-bridge.c:693:fuse_fd_cbk] 0-glusterfs-fuse: 3164490: OPEN() /XMTEXT/gfs1_000/000/000/095 => -1 (Input/output error)

 

Thanks!

 

 

 

 




_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Song,
      It seems like the file is in gfid-split-brain. To confirm, could you provide the output of following command from backends.
getfattr -d -m . -e hex <file-in-split-brain>

Pranith.


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux