Re: interpreting heal info and reported entries

Stefan <gluster@xxxxxxxxxxxxxxxxx> · Thu, 30 Jan 2020 13:22:00 +0000 (UTC)

We see the same thing, after a reboot or downtime of one server there are almost always unresolved heal entries. Which renders the whole concept of 3x or 2+1 replication kind of moot.

We could often resolve it by just running "touch" on the files through a FUSE mount, after finding out which files corresponded to the mentioned GFIDs.

We think it must be either a bug or a undocumented behaviour in GlusterFS.

Stefan

----- Original Message -----
> Date: Wed, 29 Jan 2020 16:26:35 +0000
> To: "'gluster-users@xxxxxxxxxxx'" <gluster-users@xxxxxxxxxxx>
> Subject:  interpreting heal info and reported entries
> Message-ID:
>	<650846d7a1c4461ba690479ac1c144f7@xxxxxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="us-ascii"
> 
> 
> I have glusterfs (v6.6) deployed with 3-way replication used by ovirt (v4.3).
> I recently updated 1 of the nodes (now at gluster v6.7) and rebooted. When it
> came back online, glusterfs reported there were entries to be healed under the
> 2 nodes that had stayed online.
> After 2+ days, the 2 nodes still show entries that need healing, so I'm trying
> to determine what the issue is.
> The files shown in the heal info output are small so healing should not take
> long. Also, 'Gluster v heal <vol>' and 'gluster v heal <vol> full' both return
> successful, but the entries persist.
> 
> 
> So first off, I'm a little confused by what gluster volume heal <vol> info is
> reporting.
> The following is what I see from heal info:
> 
> # gluster v heal engine info
> Brick repo0:/gluster_bricks/engine/engine
> /372501f5-062c-4790-afdb-dd7e761828ac/images/968daf61-6858-454a-9ed4-3d3db2ae1805/4317dd0d-fd35-4176-9353-7ff69e3a8dc3.meta
> /372501f5-062c-4790-afdb-dd7e761828ac/images/4e3e8ca5-0edf-42ae-ac7b-e9a51ad85922/ceb42742-eaaa-4867-aa54-da525629aae4.meta
> Status: Connected
> Number of entries: 2
> 
> Brick repo1:/gluster_bricks/engine/engine
> /372501f5-062c-4790-afdb-dd7e761828ac/images/968daf61-6858-454a-9ed4-3d3db2ae1805/4317dd0d-fd35-4176-9353-7ff69e3a8dc3.meta
> /372501f5-062c-4790-afdb-dd7e761828ac/images/4e3e8ca5-0edf-42ae-ac7b-e9a51ad85922/ceb42742-eaaa-4867-aa54-da525629aae4.meta
> Status: Connected
> Number of entries: 2
> 
> Brick repo2:/gluster_bricks/engine/engine
> Status: Connected
> Number of entries: 0
> 
> 
> Repo0 and repo1 were not rebooted, but repo2 was.
> Since repo2 went offline I would expect when it came back online it would have
> entries that need healing, but based on the heal info output that's not what it
> looks like, so I'm thinking maybe heal info isn't reporting what I think it is
> reporting.
> 
> *When gluster volume heal <vol> info reports entries as above, what is it
> saying?
> 
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users