On 10/19/2014 01:36 PM, Anirban Ghoshal
wrote:
It is possible, yes, because these are actually a
kind of log files. I suppose, like other logging
frameworks these files an remain open for a considerable
period, and then get renamed to support log rotate
semantics.
That said, I might need to check with the team that
actually manages the logging framework to be sure. I
only take care of the file-system stuff. I can tell you
for sure Monday.
If it is the same race that you mention, is there a fix
for it?
Thanks,
Anirban
|
I am working on the fix.
RCA:
0) Lets say the file 'abc.log' is opened for writing on replica pair
(brick-0, brick-1)
1) brick-0 went down
2) abc.log is renamed to abc.log.1
3) brick-0 comes back up
4) re-open on old abc.log happens from mount to brick-0
5) self-heal kicks in and deletes old abc.log and creates and syncs
abc.log.1
6) But the mount is still writing to the deleted 'old abc.log' on
brick-0 so abc.log.1 file remains at the same size while abc.log.1
file keeps increasing on brick-1. This leads to size mismatch
split-brain on abc.log.1.
Race happens between steps 4), 5). If 5) happens before 4) no
split-brain will be observed.
Work-around:
0) Take backup of good abc.log.1 file from brick-1. (Just being
paranoid)
Do any of the following two steps to make sure the stale file that
is open is closed
1-a) Take the brick process with bad file down using kill -9
<brick-pid> (In my example brick-0).
1-b) Introduce a temporary disconnect between mount and brick-0.
(I would choose 1-a)
2) Remove the bad file(abc.log.1) and its gfid-backend-file from
brick-0
3) Bring the brick back up (gluster volume start <volname>
force)/restore the connection and let it heal by doing 'stat' on the
file abc.log.1 on the mount.
This bug existed from 2012, from the first time I implemented
rename/hard-link self-heal. It is difficult to re-create. I have to
put break-points at several places in the process to hit the race.
Pranith
On 10/18/2014 04:36 PM,
Anirban Ghoshal wrote:
Hi,
Yes, they do, and considerably. I'd
forgotten to mention that on my last
email. Their mtimes, however, as far as
i could tell on separate servers, seemed
to coincide.
Thanks,
Anirban
|
Are these files always open? And is it possible that
the file could have been renamed when one of the
bricks was offline? I know of a race which can
introduce this one. Just trying to find if it is the
same case.
Pranith
hi,
Could you see if the size of
the file mismatches?
Pranith
On
10/18/2014 04:20 AM, Anirban
Ghoshal wrote:
Hi
everyone,
I
have this really confusing
split-brain here that's
bothering me. I am running
glusterfs 3.4.2 over linux
2.6.34. I have a replica 2
volume 'testvol' that is
It seems I cannot
read/stat/edit the file in
question, and `gluster
volume heal testvol info
split-brain` shows
nothing. Here are the logs
from the fuse-mount for
the volume:
[2014-09-29
07:53:02.867111] W
[fuse-bridge.c:1172:fuse_err_cbk]
0-glusterfs-fuse:
4560969: FLUSH() ERR
=> -1 (Input/output
error)
[2014-09-29
07:54:16.007799] W
[page.c:991:__ioc_page_error]
0-testvol-io-cache: page
error for page =
0x7fd5c8529d20 &
waitq = 0x7fd5c8067d40
[2014-09-29
07:54:16.007854] W
[fuse-bridge.c:2089:fuse_readv_cbk]
0-glusterfs-fuse:
4561103: READ => -1
(Input/output error)
[2014-09-29
07:54:16.008018] W
[page.c:991:__ioc_page_error]
0-testvol-io-cache: page
error for page =
0x7fd5c8607ee0 &
waitq = 0x7fd5c8067d40
[2014-09-29
07:54:16.008056] W
[fuse-bridge.c:2089:fuse_readv_cbk]
0-glusterfs-fuse:
4561104: READ => -1
(Input/output error)
[2014-09-29
07:54:16.008233] W
[page.c:991:__ioc_page_error]
0-testvol-io-cache: page
error for page =
0x7fd5c8066f30 &
waitq = 0x7fd5c8067d40
[2014-09-29
07:54:16.008269] W
[fuse-bridge.c:2089:fuse_readv_cbk]
0-glusterfs-fuse:
4561105: READ => -1
(Input/output error)
[2014-09-29
07:54:16.008800] W
[page.c:991:__ioc_page_error]
0-testvol-io-cache: page
error for page =
0x7fd5c860bcf0 &
waitq = 0x7fd5c863b1f0
[2014-09-29
07:54:16.008839] W
[fuse-bridge.c:2089:fuse_readv_cbk]
0-glusterfs-fuse:
4561107: READ => -1
(Input/output error)
[2014-09-29
07:54:16.009365] W
[page.c:991:__ioc_page_error]
0-testvol-io-cache: page
error for page =
0x7fd5c85fd120 &
waitq = 0x7fd5c8067d40
[2014-09-29
07:54:16.009413] W
[fuse-bridge.c:2089:fuse_readv_cbk]
0-glusterfs-fuse:
4561109: READ => -1
(Input/output error)
[2014-09-29
07:54:16.040549] W
[afr-open.c:213:afr_open]
0-testvol-replicate-0:
failed to open as split
brain seen, returning
EIO
[2014-09-29
07:54:16.040594] W
[fuse-bridge.c:915:fuse_fd_cbk]
0-glusterfs-fuse:
4561142: OPEN()
/SECLOG/20140908.d/SECLOG_00000000000000427425_00000000000000000000.log
=> -1 (Input/output
error)
Could
somebody please give me
some clue on where to
begin? I checked the
xattrs on /SECLOG/20140908.d/SECLOG_00000000000000427425_00000000000000000000.log
and it seems the
changelogs are [0, 0]
on both replicas, and
the gfid's match.
Thank you
very much for any help
on this.
Anirban
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
|
|
|