Hi Richard,
Thanks for the informations. As you said there is gfid mismatch for the file.On brick-1 & brick-2 the gfids are same & on brick-3 the gfid is different.
This is not considered as split-brain because we have two good copies here.
This is not considered as split-brain because we have two good copies here.
Gluster 3.10 does not have a method to resolve this situation other than the
manual intervention [1]. Basically what you need to do is remove the file and
the gfid hardlink from brick-3 (considering brick-3 entry as bad). Then when
you do a lookup for the file from mount it will recreate the entry on the other brick.
manual intervention [1]. Basically what you need to do is remove the file and
the gfid hardlink from brick-3 (considering brick-3 entry as bad). Then when
you do a lookup for the file from mount it will recreate the entry on the other brick.
Form 3.12 we have methods to resolve this situation with the cli option [2] and
with favorite-child-policy [3]. For the time being you can use [1] to resolve this
and if you can consider upgrading to 3.12 that would give you options to handle
these scenarios.
[1] http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain
[2] https://review.gluster.org/#/c/17485/
[3] https://review.gluster.org/#/c/16878/
[1] http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain
[2] https://review.gluster.org/#/c/17485/
[3] https://review.gluster.org/#/c/16878/
HTH,
Karthik
On Thu, Oct 26, 2017 at 12:40 PM, Richard Neuboeck <hawk@xxxxxxxxxxxxxxxx> wrote:
Hi Karthik,
thanks for taking a look at this. I'm not working with gluster long
enough to make heads or tails out of the logs. The logs are attached to
this mail and here is the other information:
# gluster volume info home
Volume Name: home
Type: Replicate
Volume ID: fe6218ae-f46b-42b3-a467-5fc6a36ad48a
Status: Started
Snapshot Count: 1
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sphere-six:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-four:/srv/gluster_home/brick
Options Reconfigured:
features.barrier: disable
cluster.quorum-type: auto
cluster.server-quorum-type: server
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-samba-metadata: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 90000
performance.cache-size: 1GB
performance.client-io-threads: on
cluster.lookup-optimize: on
cluster.readdir-optimize: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
cluster.server-quorum-ratio: 51%
[root@sphere-four ~]# getfattr -d -e hex -m .
/srv/gluster_home/brick/romanoch/.mozilla/firefox/ vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.baklz4
getfattr: Removing leading '/' from absolute path names
# file:
srv/gluster_home/brick/romanoch/.mozilla/firefox/ vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.baklz4
security.selinux=0x73797374656d5f753a6f626a6563 745f723a756e6c6162656c65645f74 3a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000059df20a40006 f989
trusted.gfid=0xda1c94b1643544b18d5b6f4654f6 0bf5
trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a- 72059debf4c2.contri.1= 0x0000000000009a00000000000000 0001
trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2= 0x00000001
[root@sphere-five ~]# getfattr -d -e hex -m .
/srv/gluster_home/brick/romanoch/.mozilla/firefox/ vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.baklz4
getfattr: Removing leading '/' from absolute path names
# file:
srv/gluster_home/brick/romanoch/.mozilla/firefox/ vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.baklz4
security.selinux=0x73797374656d5f753a6f626a6563 745f723a756e6c6162656c65645f74 3a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.home-client-4=0x000000010000000100000000
trusted.bit-rot.version=0x020000000000000059df1f310006 ce63
trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da2 26f9
trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a- 72059debf4c2.contri.1= 0x0000000000009a00000000000000 0001
trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2= 0x00000001
[root@sphere-six ~]# getfattr -d -e hex -m .
/srv/gluster_home/brick/romanoch/.mozilla/firefox/ vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.baklz4
getfattr: Removing leading '/' from absolute path names
# file:
srv/gluster_home/brick/romanoch/.mozilla/firefox/ vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.baklz4
security.selinux=0x73797374656d5f753a6f626a6563 745f723a756e6c6162656c65645f74 3a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.home-client-4=0x000000010000000100000000
trusted.bit-rot.version=0x020000000000000059df11cd0005 48ec
trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da2 26f9
trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a- 72059debf4c2.contri.1= 0x0000000000009a00000000000000 0001
trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2= 0x00000001
Cheers
Richard
On 26.10.17 07:41, Karthik Subrahmanya wrote:
> HeyRichard,
>
> Could you share the following informations please?
> 1. gluster volume info <volname>
> 2. getfattr output of that file from all the bricks
> getfattr -d -e hex -m . <brickpath/filepath>
> 3. glustershd & glfsheal logs
>
> Regards,
> Karthik
>
> On Thu, Oct 26, 2017 at 10:21 AM, Amar Tumballi <atumball@xxxxxxxxxx
> <mailto:atumball@xxxxxxxxxx>> wrote:
>
> On a side note, try recently released health report tool, and see if
> it does diagnose any issues in setup. Currently you may have to run
> it in all the three machines.
>
>
>
> On 26-Oct-2017 6:50 AM, "Amar Tumballi" <atumball@xxxxxxxxxx
> <mailto:atumball@xxxxxxxxxx>> wrote:
>
> Thanks for this report. This week many of the developers are at
> Gluster Summit in Prague, will be checking this and respond next
> week. Hope that's fine.
>
> Thanks,
> Amar
>
>
> On 25-Oct-2017 3:07 PM, "Richard Neuboeck"
> <hawk@xxxxxxxxxxxxxxxx <mailto:hawk@xxxxxxxxxxxxxxxx>> wrote: > <http://recovery.ba>klz4':
>
> Hi Gluster Gurus,
>
> I'm using a gluster volume as home for our users. The volume is
> replica 3, running on CentOS 7, gluster version 3.10
> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
> gluster 3.10 (3.10.6-3.fc26.x86_64).
>
> During the data backup I got an I/O error on one file. Manually
> checking for this file on a client confirms this:
>
> ls -l
> romanoch/.mozilla/firefox/vzzqqxrm.default- 1396429081309/sessionstore- backups/
> ls: cannot access
> 'romanoch/.mozilla/firefox/vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.ba
> Input/output error
> total 2015
> -rw-------. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
> -rw-------. 1 romanoch tbi 65222 Oct 17 17:57 previous.jsonlz4
> -rw-------. 1 romanoch tbi 149161 Oct 1 13:46 recovery.bak
> -?????????? ? ? ? ? ? recovery.baklz4
>
> Out of curiosity I checked all the bricks for this file. It's
> present there. Making a checksum shows that the file is
> different on
> one of the three replica servers.
>
> Querying healing information shows that the file should be
> healed:
> # gluster volume heal home info
> Brick sphere-six:/srv/gluster_home/brick > <http://recovery.ba>klz4
> /romanoch/.mozilla/firefox/vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.ba
>
> Status: Connected
> Number of entries: 1
>
> Brick sphere-five:/srv/gluster_home/brick > <http://recovery.ba>klz4
> /romanoch/.mozilla/firefox/vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.ba
> <http://recovery.ba>klz4>
> Status: Connected
> Number of entries: 1
>
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries: 0
>
> Manually triggering heal doesn't report an error but also
> does not
> heal the file.
> # gluster volume heal home
> Launching heal operation to perform index self heal on
> volume home
> has been successful
>
> Same with a full heal
> # gluster volume heal home full
> Launching heal operation to perform full self heal on volume
> home
> has been successful
>
> According to the split brain query that's not the problem:
> # gluster volume heal home info split-brain
> Brick sphere-six:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick sphere-five:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick sphere-four:/srv/gluster_home/brick
> Status: Connected
> Number of entries in split-brain: 0
>
>
> I have no idea why this situation arose in the first place
> and also
> no idea as how to solve this problem. I would highly
> appreciate any
> helpful feedback I can get.
>
> The only mention in the logs matching this file is a rename
> operation:
> /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017- 10-23
> 09:19:11.561661] I [MSGID: 115061]
> [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server:
> 5266153:
> RENAME
> /romanoch/.mozilla/firefox/vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.jsonlz4
> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
> /romanoch/.mozilla/firefox/vzzqqxrm.default- 1396429081309/sessionstore- backups/recovery.ba
> (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client: > Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@gluster.
> romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07: 206366-home-client-3-0-0,
> error-xlator: home-posix [No data available]
>
> I enabled directory quotas the same day this problem showed
> up but
> I'm not sure how quotas could have an effect like this
> (maybe unless
> the limit is reached but that's also not the case).
>
> Thanks again if anyone as an idea.
> Cheers
> Richard
> --
> /dev/null
>
>
> _______________________________________________
> Gluster-users mailing list
org >
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users >
>
>
> _______________________________________________
> Gluster-users mailing list
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users