Re: Quick way to fix stale gfids?

Strahil Nikolov <hunter86_bg@xxxxxxxxx> · Tue, 14 Feb 2023 07:03:02 +0000 (UTC)

That is not normal.Which version are you using ?

Can you provide the output from all bricks (including the arbiter):
getfattr -d -m . -e hex  /BRICK/PATH/TO/output_21

Troubleshooting and restoring the files should be your secondary tasks, so you should focus on stabilizing the cluster.

First, enable debug log for bricks if you have the space (see https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/configuring_the_log_level ) to troubleshoot the dying bricks.

Best Regards,
Strahil Nikolov

   On Mon, Feb 13, 2023 at 13:21, Diego Zuccato
<diego.zuccato@xxxxxxxx> wrote:

  My volume is replica 3 arbiter 1, maybe that makes a difference?
Bricks processes tend to die quite often (I have to restart glusterd at 
least once a day because "gluster v info | grep ' N '" reports at least 
one missing brick; sometimes even if all bricks are reported up I have 
to kill all glusterfs[d] processes and restart glusterd).

The 3 servers have 192GB RAM (that should be way more than enough!), 30 
data bricks and 15 arbiters (the arbiters share a single SSD).

And I noticed that some "stale file handle" are not reported by heal info.

root@str957-cluster:/# ls -l 
/scratch/extra/m******/PNG/PNGQuijote/ModGrav/fNL40/
ls: cannot access 
'/scratch/extra/m******/PNG/PNGQuijote/ModGrav/fNL40/output_21': Stale 
file handle
total 40
d?????????  ? ?            ?               ?            ? output_21
...
but "gluster v heal cluster_data info |grep output_21" returns nothing. :(

Seems the other stale handles either got corrected by subsequent 'stat's 
or became I/O errors.

Diego.

Il 12/02/2023 21:34, Strahil Nikolov ha scritto:
> The 2-nd error indicates conflicts between the nodes. The only way that 
> could happen on replica 3 is gfid conflict (file/dir was renamed or 
> recreated).
> 
> Are you sure that all bricks are online? Usually 'Transport endpoint is 
> not connected' indicates a brick down situation.
> 
> First start with all stale file handles:
> check md5sum on all bricks. If it differs somewhere, delete the gfid and 
> move the file away from the brick and check in FUSE. If it's fine , 
> touch it and the FUSE client will "heal" it.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
>     On Tue, Feb 7, 2023 at 16:33, Diego Zuccato
>     <diego.zuccato@xxxxxxxx> wrote:
>     The contents do not match exactly, but the only difference is the
>     "option shared-brick-count" line that sometimes is 0 and sometimes 1.
> 
>     The command you gave could be useful for the files that still needs
>     healing with the source still present, but the files related to the
>     stale gfids have been deleted, so "find -samefile" won't find anything.
> 
>     For the other files reported by heal info, I saved the output to
>     'healinfo', then:
>        for T in $(grep '^/' healinfo |sort|uniq); do stat /mnt/scratch$T >
>     /dev/null; done
> 
>     but I still see a lot of 'Transport endpoint is not connected' and
>     'Stale file handle' errors :( And many 'No such file or directory'...
> 
>     I don't understand the first two errors, since /mnt/scratch have been
>     freshly mounted after enabling client healing, and gluster v info does
>     not highlight unconnected/down bricks.
> 
>     Diego
> 
>     Il 06/02/2023 22:46, Strahil Nikolov ha scritto:
>      > I'm not sure if the md5sum has to match , but at least the content
>      > should do.
>      > In modern versions of GlusterFS the client side healing is
>     disabled ,
>      > but it's worth trying.
>      > You will need to enable cluster.metadata-self-heal,
>      > cluster.data-self-heal and cluster.entry-self-heal and then create a
>      > small one-liner that identifies the names of the files/dirs from the
>      > volume heal ,so you can stat them through the FUSE.
>      >
>      > Something like this:
>      >
>      >
>      > for i in $(gluster volume heal <VOL> info | awk -F '<gfid:|>'
>     '/gfid:/
>      > {print $2}'); do find /PATH/TO/BRICK/ -samefile
>      > /PATH/TO/BRICK/.glusterfs/${i:0:2}/${i:2:2}/$i | awk '!/.glusterfs/
>      > {gsub("/PATH/TO/BRICK", "stat /MY/FUSE/MOUNTPOINT", $0); print
>     $0}' ; done
>      >
>      > Then Just copy paste the output and you will trigger the client side
>      > heal only on the affected gfids.
>      >
>      > Best Regards,
>      > Strahil Nikolov
>      > В понеделник, 6 февруари 2023 г., 10:19:02 ч. Гринуич+2, Diego
>     Zuccato
>      > <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>> написа:
>      >
>      >
>      > Ops... Reincluding the list that got excluded in my previous
>     answer :(
>      >
>      > I generated md5sums of all files in vols/ on clustor02 and
>     compared to
>      > the other nodes (clustor00 and clustor01).
>      > There are differences in volfiles (shouldn't it always be 1,
>     since every
>      > data brick is on its own fs? quorum bricks, OTOH, share a single
>      > partition on SSD and should always be 15, but in both cases sometimes
>      > it's 0).
>      >
>      > I nearly got a stroke when I saw diff output for 'info' files,
>     but once
>      > I sorted 'em their contents matched. Pfhew!
>      >
>      > Diego
>      >
>      > Il 03/02/2023 19:01, Strahil Nikolov ha scritto:
>      >  > This one doesn't look good:
>      >  >
>      >  >
>      >  > [2023-02-03 07:45:46.896924 +0000] E [MSGID: 114079]
>      >  > [client-handshake.c:1253:client_query_portmap]
>     0-cluster_data-client-48:
>      >  > remote-subvolume not set in volfile []
>      >  >
>      >  >
>      >  > Can you compare all vol files in /var/lib/glusterd/vols/
>     between the
>      > nodes ?
>      >  > I have the suspicioun that there is a vol file mismatch (maybe
>      >  > /var/lib/glusterd/vols/<VOLUME_NAME>/*-shd.vol).
>      >  >
>      >  > Best Regards,
>      >  > Strahil Nikolov
>      >  >
>      >  >    On Fri, Feb 3, 2023 at 12:20, Diego Zuccato
>      >  >    <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>
>     <mailto:diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>>> wrote:
>      >  >    Can't see anything relevant in glfsheal log, just messages
>     related to
>      >  >    the crash of one of the nodes (the one that had the mobo
>     replaced... I
>      >  >    fear some on-disk structures could have been silently
>     damaged by RAM
>      >  >    errors and that makes gluster processes crash, or it's just
>     an issue
>      >  >    with enabling brick-multiplex).
>      >  >    -8<--
>      >  >    [2023-02-03 07:45:46.896924 +0000] E [MSGID: 114079]
>      >  >    [client-handshake.c:1253:client_query_portmap]
>      >  >    0-cluster_data-client-48:
>      >  >    remote-subvolume not set in volfile []
>      >  >    [2023-02-03 07:45:46.897282 +0000] E
>      >  >    [rpc-clnt.c:331:saved_frames_unwind] (-->
>      >  >
>      >
>     /lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x195)[0x7fce0c867b95]
>      >  >    (-->
>     /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x72fc)[0x7fce0c0ca2fc] (-->
>      >  >
>      >
>     /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x109)[0x7fce0c0d2419]
>      >  >    (-->
>     /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x10308)[0x7fce0c0d3308]
>      > (-->
>      >  >
>      >
>     /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fce0c0ce7e6]
>      >  >    ))))) 0-cluster_data-client-48: forced unwinding frame
>     type(GF-DUMP)
>      >  >    op(NULL(2)) called at 2023-02-03 07:45:46.891054 +0000
>     (xid=0x13)
>      >  >    -8<--
>      >  >
>      >  >    Well, actually I *KNOW* the files outside .glusterfs have
>     been deleted
>      >  >    (by me :) ). That's why I call those 'stale' gfids.
>      >  >    Affected entries under .glusterfs have usually link count =
>     1 =>
>      >  >    nothing
>      >  >    'find' can find.
>      >  >    Since I already recovered those files (before deleting from
>     bricks),
>      >  >    can
>      >  >    .glusterfs entries be deleted too or should I check
>     something else?
>      >  >    Maybe I should create a script that finds all files/dirs (not
>      > symlinks,
>      >  >    IIUC) in .glusterfs on all bricks/arbiters and moves 'em to
>     a temp
>      > dir?
>      >  >
>      >  >    Diego
>      >  >
>      >  >    Il 02/02/2023 23:35, Strahil Nikolov ha scritto:
>      >  >      > Any issues reported in /var/log/glusterfs/glfsheal-*.log ?
>      >  >      >
>      >  >      > The easiest way to identify the affected entries is to run:
>      >  >      > find /FULL/PATH/TO/BRICK/ -samefile
>      >  >      >
>      >  >
>      >
>     /FULL/PATH/TO/BRICK/.glusterfs/57/e4/57e428c7-6bed-4eb3-b9bd-02ca4c46657a
>      >  >      >
>      >  >      >
>      >  >      > Best Regards,
>      >  >      > Strahil Nikolov
>      >  >      >
>      >  >      >
>      >  >      > В вторник, 31 януари 2023 г., 11:58:24 ч. Гринуич+2,
>     Diego Zuccato
>      >  >      > <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>
>     <mailto:diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>>
>      > <mailto:diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>
>     <mailto:diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>>>>
>     написа:
>      >  >      >
>      >  >      >
>      >  >      > Hello all.
>      >  >      >
>      >  >      > I've had one of the 3 nodes serving a "replica 3
>     arbiter 1"
>      > down for
>      >  >      > some days (apparently RAM issues, but actually failing
>     mobo).
>      >  >      > The other nodes have had some issues (RAM exhaustion,
>     old problem
>      >  >      > already ticketed but still no solution) and some brick
>     processes
>      >  >      > coredumped. Restarting the processes allowed the
>     cluster to
>      > continue
>      >  >      > working. Mostly.
>      >  >      >
>      >  >      > After the third server got fixed I started a heal, but
>     files
>      >  >    didn't get
>      >  >      > healed and count (by "ls -l
>      >  >      > /srv/bricks/*/d/.glusterfs/indices/xattrop/|grep ^-|wc
>     -l")
>      > did not
>      >  >      > decrease over 2 days. So, to recover I copied files
>     from bricks
>      >  >    to temp
>      >  >      > storage (keeping both copies of conflicting files with
>     different
>      >  >      > contents), removed files on bricks and arbiters, and
>     finally
>      >  >    copied back
>      >  >      > from temp storage to the volume.
>      >  >      >
>      >  >      > Now the files are accessible but I still see lots of
>     entries like
>      >  >      > <gfid:57e428c7-6bed-4eb3-b9bd-02ca4c46657a>
>      >  >      >
>      >  >      > IIUC that's due to a mismatch between .glusterfs/
>     contents and
>      > normal
>      >  >      > hierarchy. Is there some tool to speed up the cleanup?
>      >  >      >
>      >  >      > Tks.
>      >  >      >
>      >  >      > --
>      >  >      > Diego Zuccato
>      >  >      > DIFA - Dip. di Fisica e Astronomia
>      >  >      > Servizi Informatici
>      >  >      > Alma Mater Studiorum - Università di Bologna
>      >  >      > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>      >  >      > tel.: +39 051 20 95786
>      >  >      > ________
>      >  >      >
>      >  >      >
>      >  >      >
>      >  >      > Community Meeting Calendar:
>      >  >      >
>      >  >      > Schedule -
>      >  >      > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>      >  >      > Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >
>      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>
>      >  >    <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >
>      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >>>
>      >  >      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >
>      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>
>      >  >    <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >
>      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>>>
>      >  >      > Gluster-users mailing list
>      >  >      > Gluster-users@xxxxxxxxxxx
>     <mailto:Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx
>     <mailto:Gluster-users@xxxxxxxxxxx>>
>      > <mailto:Gluster-users@xxxxxxxxxxx
>     <mailto:Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx
>     <mailto:Gluster-users@xxxxxxxxxxx>>>
>      >  >    <mailto:Gluster-users@xxxxxxxxxxx
>     <mailto:Gluster-users@xxxxxxxxxxx>
>      > <mailto:Gluster-users@xxxxxxxxxxx
>     <mailto:Gluster-users@xxxxxxxxxxx>>
>     <mailto:Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
>      > <mailto:Gluster-users@xxxxxxxxxxx
>     <mailto:Gluster-users@xxxxxxxxxxx>>>>
>      >  >      >
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >
>      > <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>
>      >  >    <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >
>      > <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >>>
>      >  >      >
>     <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >
>      > <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>
>      >  >    <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >
>      > <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>>>
> 
>      >
>      >  >
>      >  >
>      >  >    --
>      >  >    Diego Zuccato
>      >  >    DIFA - Dip. di Fisica e Astronomia
>      >  >    Servizi Informatici
>      >  >    Alma Mater Studiorum - Università di Bologna
>      >  >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>      >  >    tel.: +39 051 20 95786
>      >  >
>      >
>      > --
>      > Diego Zuccato
>      > DIFA - Dip. di Fisica e Astronomia
>      > Servizi Informatici
>      > Alma Mater Studiorum - Università di Bologna
>      > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>      > tel.: +39 051 20 95786
> 
>     -- 
>     Diego Zuccato
>     DIFA - Dip. di Fisica e Astronomia
>     Servizi Informatici
>     Alma Mater Studiorum - Università di Bologna
>     V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>     tel.: +39 051 20 95786
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users