On 09/02/17 06:07, Nag Pavan Chilakam wrote:
----- Original Message -----
From: "lejeczek" <peljasz@xxxxxxxxxxx>
To: "Nag Pavan Chilakam" <nchilaka@xxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Sent: Wednesday, 8 February, 2017 7:15:29 PM
Subject: Re: Input/output error - would not heal
On 08/02/17 06:11, Nag Pavan Chilakam wrote:
"gluster volume info" and "gluster vol status" would help in us debug faster.
However, coming to gfid mismatch, yes the file "abbreviations.log" (I assume the other brick copy also to be " abbreviations.log" and not "breviations.log" ....typo mistake?) is in gfid mismatch leading to IO error(gfid splitbrain)
Resolving data and metadata splitbrains are not recommended to be done from backend brick.
But in case of a GFID splitbrain(like in file abbreviations.log), the only method available is resolving from backend brick
You can read more about this in http://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/?highlight=gfid (Fixing Directory entry split-brain section)
(There is a bug already existing to resolve gfid splitbrain using CLI )
I've read that doc, however I'm not sure what to do with
bits that are not mentioned in that doc. Which is:
when some xattr does not exist on one copy but does on the
other, like:
3]$ getfattr -d -m . -e hex .vim.backup/.bash_profile.swp
# file: .vim.backup/.bash_profile.swp
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.USER-HOME-client-0=0x000000010000000100000000
trusted.afr.USER-HOME-client-5=0x000000010000000100000000
2]$ getfattr -d -m . -e hex .vim.backup/.bash_profile.swp
# file: .vim.backup/.bash_profile.swp
security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
trusted.afr.USER-HOME-client-5=0x000000010000000100000000
trusted.afr.USER-HOME-client-6=0x000000010000000100000000
that means the file .bash_profile.swp is possibly in a data and metadata splitbrain
I need to understand the volume configuration, that is the reason I am asking for volume info
By seeing the above, I am guessing that it is a x3 volume(3 replica copies)
as per my first email:
...
v3.9. it's a two-brick volume, was three but removed one I
think a few hours before the problem was first noticed.
...
and vol info:
Volume Name: USER-HOME
Type: Replicate
Volume ID: 9e4ed9b7-373a-413b-bc82-b6f978e82ec4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.5.6.100:/__.aLocalStorages/3/0-GLUSTERs/0-USER
Brick2: 10.5.6.49:/__.aLocalStorages/3/0-GLUSTERs/0-USER
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
many thanks,
L.
unless the doc talks about it and I've gone (temporarily)
blind, but if it's does not it would be great to include
more scenarios/cases there.
many thx.
L.
thanks,
nagpavan
----- Original Message -----
From: "lejeczek" <peljasz@xxxxxxxxxxx>
To: "Nag Pavan Chilakam" <nchilaka@xxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Sent: Tuesday, 7 February, 2017 10:53:07 PM
Subject: Re: Input/output error - would not heal
On 07/02/17 12:50, Nag Pavan Chilakam wrote:
Hi,
Can you help us with more information on the volume, like volume status and volume info
One reason of "transport endpoint error" is the brick could be down
Also, i see that the syntax used for healing is wrong.
You need to use as below:
gluster v heal <vname> split-brain source-brick <brick path> <filename considering brick path as />
In yourcase if brick path is "/G-store/1" and the file to be healed is "that_file" , then use below syntax (in this case i am considering "that_file" lying under the brick path directly"
gluster volume heal USER-HOME split-brain source-brick 10.5.6.100:/G-store/1 /that_file
that was that, my copy-paste typo, it does not heal.
Interestingly, that file is not reported by heal.
I've replied to - GFID Mismatch - Automatic Correction ? -
I think my problem is similar, here is a file the heal
actually sees:
$ gluster vol heal USER-HOME info
Brick
10.5.6.100:/__.aLocalStorages/3/0-GLUSTERs/0-USER.HOME/aUser/.vim.backup/.bash_profile.swp
Status: Connected
Number of entries: 1
Brick
10.5.6.49:/__.aLocalStorages/3/0-GLUSTERs/0-USER.HOME/aUser/.vim.backup/.bash_profile.swp
Status: Connected
Number of entries: 1
I'm copying+pasting what I said in that reply to that thread:
...
yep, I'm seeing the same:
as follows:
3]$ getfattr -d -m . -e hex .
# file: .
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.USER-HOME-client-2=0x000000000000000000000000
trusted.afr.USER-HOME-client-3=0x000000000000000000000000
trusted.afr.USER-HOME-client-5=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x06341b521ba94ab7938eca57f7a1824f
trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5898e0cf000dd2fe
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00701c90fcb11200fffffef6f08c798e0000006a99819205
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x00701c90fcb11200fffffef6f08c798e0000006a99819205
3]$ getfattr -d -m . -e hex .vim.backup
# file: .vim.backup
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.USER-HOME-client-3=0x000000000000000000000000
trusted.gfid=0x0b3a223955534de89086679a4dce8156
trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5898621c0005d720
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.06341b52-1ba9-4ab7-938e-ca57f7a1824f.contri.1=0x000000000000040000000000000000020000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000040000000000000000020000000000000001
3]$ getfattr -d -m . -e hex .vim.backup/.bash_profile.swp
# file: .vim.backup/.bash_profile.swp
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.USER-HOME-client-0=0x000000010000000100000000
trusted.afr.USER-HOME-client-5=0x000000010000000100000000
trusted.gfid=0xc2693670fc6d4fed953f21dcb77a02cf
trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5896043c000baa55
trusted.glusterfs.quota.0b3a2239-5553-4de8-9086-679a4dce8156.contri.1=0x00000000000000000000000000000001
trusted.pgfid.0b3a2239-5553-4de8-9086-679a4dce8156=0x00000001
2]$ getfattr -d -m . -e hex .
# file: .
security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
trusted.afr.USER-HOME-client-1=0x000000000000000000000000
trusted.afr.USER-HOME-client-2=0x000000000000000000000000
trusted.afr.USER-HOME-client-3=0x000000000000000000000000
trusted.afr.USER-HOME-client-5=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x06341b521ba94ab7938eca57f7a1824f
trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5898e0d000016f82
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0xa5e66200a7a45000cb96fbf7d6336229fae7152d8851097b
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0xa5e66200a7a45000cb96fbf7d6336229fae7152d8851097b
2]$ getfattr -d -m . -e hex .vim.backup
# file: .vim.backup
security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
trusted.afr.USER-HOME-client-3=0x000000000000000000000000
trusted.gfid=0x0b3a223955534de89086679a4dce8156
trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5898621b000855fe
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.06341b52-1ba9-4ab7-938e-ca57f7a1824f.contri.1=0x000000000000040000000000000000020000000000000001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000040000000000000000020000000000000001
2]$ getfattr -d -m . -e hex .vim.backup/.bash_profile.swp
# file: .vim.backup/.bash_profile.swp
security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
trusted.afr.USER-HOME-client-5=0x000000010000000100000000
trusted.afr.USER-HOME-client-6=0x000000010000000100000000
trusted.gfid=0x8a5b6e4ad18a49d0bae920c9cf8673a5
trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5896041400058191
trusted.glusterfs.quota.0b3a2239-5553-4de8-9086-679a4dce8156.contri.1=0x00000000000000000000000000000001
trusted.pgfid.0b3a2239-5553-4de8-9086-679a4dce8156=0x00000001
and the log bit:
GFID mismatch for
<gfid:335bf026-68bd-4bf4-9cba-63b65b12c0b1>/abbreviations.xlsx
6e9a7fa1-bfbe-4a59-ad06-a78ee1625649 on USER-HOME-client-6
and 773b7ea3-31cf-4b24-94f0-0b61b573b082 on USER-HOME-client-0
most importantly, is there a workaround for the problem, as
of now? Before the bug, it it's such, gets fixed.
b.w.
L.
-- end of paste
but I have a few more files which also report I/O errors and
heal does NOT even mention them:
on the brick that is a "master"(samba was sharing to the users)
# file: abbreviations.log
security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000589081fd00060376
trusted.gfid=0x773b7ea331cf4b2494f00b61b573b082
trusted.glusterfs.quota.335bf026-68bd-4bf4-9cba-63b65b12c0b1.contri.1=0x0000000000002a000000000000000001
trusted.pgfid.335bf026-68bd-4bf4-9cba-63b65b12c0b1=0x00000001
on the "slave" brick, was not serving files (certainly not
that file) to any users:
# file: bbreviations.log
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000588c958a000b67ea
trusted.gfid=0x6e9a7fa1bfbe4a59ad06a78ee1625649
trusted.glusterfs.quota.335bf026-68bd-4bf4-9cba-63b65b12c0b1.contri.1=0x0000000000002a000000000000000001
trusted.pgfid.335bf026-68bd-4bf4-9cba-63b65b12c0b1=0x00000001
Question that probably was answered many times: is it OK to
tamper with(remove in my case) files directly from bricks?
many thanks,
L.
regards,
nag pavan
----- Original Message -----
From: "lejeczek"<peljasz@xxxxxxxxxxx>
To:gluster-users@xxxxxxxxxxx
Sent: Tuesday, 7 February, 2017 2:00:51 AM
Subject: Input/output error - would not heal
hi all
I'm hitting such problem:
$ gluster vol heal USER-HOME split-brain source-brick
10.5.6.100:/G-store/1
Healing gfid:8a5b6e4a-d18a-49d0-bae9-20c9cf8673a5
failed:Transport endpoint is not connected.
Status: Connected
Number of healed entries: 0
$ gluster vol heal USER-HOME split-brain source-brick
10.5.6.100:/G-store/1/that_file
Lookup failed on /that_file:Input/output error
Volume heal failed.
v3.9. it's a two-brick volume, was three but removed one I
think a few hours before the problem was first noticed.
what to do now?
many thanks,
L
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users