Re: not healing one file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Richard,

Thanks for the informations. As you said there is gfid mismatch for the file.
On brick-1 & brick-2 the gfids are same & on brick-3 the gfid is different.
This is not considered as split-brain because we have two good copies here.
Gluster 3.10 does not have a method to resolve this situation other than the
manual intervention [1]. Basically what you need to do is remove the file and
the gfid hardlink from brick-3 (considering brick-3 entry as bad). Then when
you do a lookup for the file from mount it will recreate the entry on the other brick.

Form 3.12 we have methods to resolve this situation with the cli option [2] and
with favorite-child-policy [3]. For the time being you can use [1] to resolve this
and if you can consider upgrading to 3.12 that would give you options to handle
these scenarios.

[1] http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain
[2] https://review.gluster.org/#/c/17485/
[3] https://review.gluster.org/#/c/16878/

HTH,
Karthik

On Thu, Oct 26, 2017 at 12:40 PM, Richard Neuboeck <hawk@xxxxxxxxxxxxxxxx> wrote:
Hi Karthik,

thanks for taking a look at this. I'm not working with gluster long
enough to make heads or tails out of the logs. The logs are attached to
this mail and here is the other information:

# gluster volume info home

Volume Name: home
Type: Replicate
Volume ID: fe6218ae-f46b-42b3-a467-5fc6a36ad48a
Status: Started
Snapshot Count: 1
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sphere-six:/srv/gluster_home/brick
Brick2: sphere-five:/srv/gluster_home/brick
Brick3: sphere-four:/srv/gluster_home/brick
Options Reconfigured:
features.barrier: disable
cluster.quorum-type: auto
cluster.server-quorum-type: server
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-samba-metadata: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 90000
performance.cache-size: 1GB
performance.client-io-threads: on
cluster.lookup-optimize: on
cluster.readdir-optimize: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
cluster.server-quorum-ratio: 51%


[root@sphere-four ~]# getfattr -d -e hex -m .
/srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
getfattr: Removing leading '/' from absolute path names
# file:
srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000059df20a40006f989
trusted.gfid=0xda1c94b1643544b18d5b6f4654f60bf5
trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=0x0000000000009a000000000000000001
trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001

[root@sphere-five ~]# getfattr -d -e hex -m .
/srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
getfattr: Removing leading '/' from absolute path names
# file:
srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.home-client-4=0x000000010000000100000000
trusted.bit-rot.version=0x020000000000000059df1f310006ce63
trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=0x0000000000009a000000000000000001
trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001

[root@sphere-six ~]# getfattr -d -e hex -m .
/srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
getfattr: Removing leading '/' from absolute path names
# file:
srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.home-client-4=0x000000010000000100000000
trusted.bit-rot.version=0x020000000000000059df11cd000548ec
trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=0x0000000000009a000000000000000001
trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001

Cheers
Richard

On 26.10.17 07:41, Karthik Subrahmanya wrote:
> HeyRichard,
>
> Could you share the following informations please?
> 1. gluster volume info <volname>
> 2. getfattr output of that file from all the bricks
>     getfattr -d -e hex -m . <brickpath/filepath>
> 3. glustershd & glfsheal logs
>
> Regards,
> Karthik
>
> On Thu, Oct 26, 2017 at 10:21 AM, Amar Tumballi <atumball@xxxxxxxxxx
> <mailto:atumball@xxxxxxxxxx>> wrote:
>
>     On a side note, try recently released health report tool, and see if
>     it does diagnose any issues in setup. Currently you may have to run
>     it in all the three machines.
>
>
>
>     On 26-Oct-2017 6:50 AM, "Amar Tumballi" <atumball@xxxxxxxxxx
>     <mailto:atumball@xxxxxxxxxx>> wrote:
>
>         Thanks for this report. This week many of the developers are at
>         Gluster Summit in Prague, will be checking this and respond next
>         week. Hope that's fine.
>
>         Thanks,
>         Amar
>
>
>         On 25-Oct-2017 3:07 PM, "Richard Neuboeck"
>         <hawk@xxxxxxxxxxxxxxxx <mailto:hawk@xxxxxxxxxxxxxxxx>> wrote:
>
>             Hi Gluster Gurus,
>
>             I'm using a gluster volume as home for our users. The volume is
>             replica 3, running on CentOS 7, gluster version 3.10
>             (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
>             gluster 3.10 (3.10.6-3.fc26.x86_64).
>
>             During the data backup I got an I/O error on one file. Manually
>             checking for this file on a client confirms this:
>
>             ls -l
>             romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/
>             ls: cannot access
>             'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.ba
>             <http://recovery.ba>klz4':
>             Input/output error
>             total 2015
>             -rw-------. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
>             -rw-------. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
>             -rw-------. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
>             -?????????? ? ?        ?        ?            ? recovery.baklz4
>
>             Out of curiosity I checked all the bricks for this file. It's
>             present there. Making a checksum shows that the file is
>             different on
>             one of the three replica servers.
>
>             Querying healing information shows that the file should be
>             healed:
>             # gluster volume heal home info
>             Brick sphere-six:/srv/gluster_home/brick
>             /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.ba
>             <http://recovery.ba>klz4
>
>             Status: Connected
>             Number of entries: 1
>
>             Brick sphere-five:/srv/gluster_home/brick
>             /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.ba
>             <http://recovery.ba>klz4
>
>             Status: Connected
>             Number of entries: 1
>
>             Brick sphere-four:/srv/gluster_home/brick
>             Status: Connected
>             Number of entries: 0
>
>             Manually triggering heal doesn't report an error but also
>             does not
>             heal the file.
>             # gluster volume heal home
>             Launching heal operation to perform index self heal on
>             volume home
>             has been successful
>
>             Same with a full heal
>             # gluster volume heal home full
>             Launching heal operation to perform full self heal on volume
>             home
>             has been successful
>
>             According to the split brain query that's not the problem:
>             # gluster volume heal home info split-brain
>             Brick sphere-six:/srv/gluster_home/brick
>             Status: Connected
>             Number of entries in split-brain: 0
>
>             Brick sphere-five:/srv/gluster_home/brick
>             Status: Connected
>             Number of entries in split-brain: 0
>
>             Brick sphere-four:/srv/gluster_home/brick
>             Status: Connected
>             Number of entries in split-brain: 0
>
>
>             I have no idea why this situation arose in the first place
>             and also
>             no idea as how to solve this problem. I would highly
>             appreciate any
>             helpful feedback I can get.
>
>             The only mention in the logs matching this file is a rename
>             operation:
>             /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
>             09:19:11.561661] I [MSGID: 115061]
>             [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server:
>             5266153:
>             RENAME
>             /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.jsonlz4
>             (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
>             /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.ba
>             <http://recovery.ba>klz4
>             (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
>             romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-home-client-3-0-0,
>             error-xlator: home-posix [No data available]
>
>             I enabled directory quotas the same day this problem showed
>             up but
>             I'm not sure how quotas could have an effect like this
>             (maybe unless
>             the limit is reached but that's also not the case).
>
>             Thanks again if anyone as an idea.
>             Cheers
>             Richard
>             --
>             /dev/null
>
>
>             _______________________________________________
>             Gluster-users mailing list
>             Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@gluster.org>
>             http://lists.gluster.org/mailman/listinfo/gluster-users
>             <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>     _______________________________________________
>     Gluster-users mailing list

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux