Re: Quick way to fix stale gfids?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ops... Reincluding the list that got excluded in my previous answer :(

I generated md5sums of all files in vols/ on clustor02 and compared to the other nodes (clustor00 and clustor01). There are differences in volfiles (shouldn't it always be 1, since every data brick is on its own fs? quorum bricks, OTOH, share a single partition on SSD and should always be 15, but in both cases sometimes it's 0).

I nearly got a stroke when I saw diff output for 'info' files, but once I sorted 'em their contents matched. Pfhew!

Diego

Il 03/02/2023 19:01, Strahil Nikolov ha scritto:
This one doesn't look good:


[2023-02-03 07:45:46.896924 +0000] E [MSGID: 114079]
[client-handshake.c:1253:client_query_portmap] 0-cluster_data-client-48:
remote-subvolume not set in volfile []


Can you compare all vol files in /var/lib/glusterd/vols/ between the nodes ?
I have the suspicioun that there is a vol file mismatch (maybe /var/lib/glusterd/vols/<VOLUME_NAME>/*-shd.vol).

Best Regards,
Strahil Nikolov

    On Fri, Feb 3, 2023 at 12:20, Diego Zuccato
    <diego.zuccato@xxxxxxxx> wrote:
    Can't see anything relevant in glfsheal log, just messages related to
    the crash of one of the nodes (the one that had the mobo replaced... I
    fear some on-disk structures could have been silently damaged by RAM
    errors and that makes gluster processes crash, or it's just an issue
    with enabling brick-multiplex).
    -8<--
    [2023-02-03 07:45:46.896924 +0000] E [MSGID: 114079]
    [client-handshake.c:1253:client_query_portmap]
    0-cluster_data-client-48:
    remote-subvolume not set in volfile []
    [2023-02-03 07:45:46.897282 +0000] E
    [rpc-clnt.c:331:saved_frames_unwind] (-->
    /lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x195)[0x7fce0c867b95]
    (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x72fc)[0x7fce0c0ca2fc] (-->
    /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x109)[0x7fce0c0d2419]
    (--> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x10308)[0x7fce0c0d3308] (-->
    /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fce0c0ce7e6]
    ))))) 0-cluster_data-client-48: forced unwinding frame type(GF-DUMP)
    op(NULL(2)) called at 2023-02-03 07:45:46.891054 +0000 (xid=0x13)
    -8<--

    Well, actually I *KNOW* the files outside .glusterfs have been deleted
    (by me :) ). That's why I call those 'stale' gfids.
    Affected entries under .glusterfs have usually link count = 1 =>
    nothing
    'find' can find.
    Since I already recovered those files (before deleting from bricks),
    can
    .glusterfs entries be deleted too or should I check something else?
    Maybe I should create a script that finds all files/dirs (not symlinks,
    IIUC) in .glusterfs on all bricks/arbiters and moves 'em to a temp dir?

    Diego

    Il 02/02/2023 23:35, Strahil Nikolov ha scritto:
     > Any issues reported in /var/log/glusterfs/glfsheal-*.log ?
     >
     > The easiest way to identify the affected entries is to run:
     > find /FULL/PATH/TO/BRICK/ -samefile
     >
    /FULL/PATH/TO/BRICK/.glusterfs/57/e4/57e428c7-6bed-4eb3-b9bd-02ca4c46657a
     >
     >
     > Best Regards,
     > Strahil Nikolov
     >
     >
     > В вторник, 31 януари 2023 г., 11:58:24 ч. Гринуич+2, Diego Zuccato
     > <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>> написа:
     >
     >
     > Hello all.
     >
     > I've had one of the 3 nodes serving a "replica 3 arbiter 1" down for
     > some days (apparently RAM issues, but actually failing mobo).
     > The other nodes have had some issues (RAM exhaustion, old problem
     > already ticketed but still no solution) and some brick processes
     > coredumped. Restarting the processes allowed the cluster to continue
     > working. Mostly.
     >
     > After the third server got fixed I started a heal, but files
    didn't get
     > healed and count (by "ls -l
     > /srv/bricks/*/d/.glusterfs/indices/xattrop/|grep ^-|wc -l") did not
     > decrease over 2 days. So, to recover I copied files from bricks
    to temp
     > storage (keeping both copies of conflicting files with different
     > contents), removed files on bricks and arbiters, and finally
    copied back
     > from temp storage to the volume.
     >
     > Now the files are accessible but I still see lots of entries like
     > <gfid:57e428c7-6bed-4eb3-b9bd-02ca4c46657a>
     >
     > IIUC that's due to a mismatch between .glusterfs/ contents and normal
     > hierarchy. Is there some tool to speed up the cleanup?
     >
     > Tks.
     >
     > --
     > Diego Zuccato
     > DIFA - Dip. di Fisica e Astronomia
     > Servizi Informatici
     > Alma Mater Studiorum - Università di Bologna
     > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
     > tel.: +39 051 20 95786
     > ________
     >
     >
     >
     > Community Meeting Calendar:
     >
     > Schedule -
     > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
     > Bridge: https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk >
     > <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>>
     > Gluster-users mailing list
     > Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
    <mailto:Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>>
     > https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users >
     > <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>>


-- Diego Zuccato
    DIFA - Dip. di Fisica e Astronomia
    Servizi Informatici
    Alma Mater Studiorum - Università di Bologna
    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
    tel.: +39 051 20 95786


--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux