Re: orphaned snapshots

Sebastian Neustein <sebastian.neustein@xxxxxxxxxxxxx> · Wed, 16 Aug 2023 12:20:58 +0200

    Strahil Nikolov:

      I’ve never had such situation and I don’t recall someone sharing
      something similar.

    That's strange, it is really easy to reproduce. This is from a fresh
    test environment:

    summary:

    - There is one snapshot present. 

    - On one node glusterd is stopped. 

    - During the stop, one snapshot is deleted. 

    - The node is brought up again

    - On that node there is an orphaned snapshot

    detailed version:

    # on node 1:

    root@gl1:~# cat /etc/debian_version

    11.7

    root@gl1:~# gluster --version

    glusterfs 10.4

    root@gl1:~# gluster volume info

    Volume Name: glvol_samba

    Type: Replicate

    Volume ID: 91cb059e-10e4-4439-92ea-001065652749

    Status: Started

    Snapshot Count: 1

    Number of Bricks: 1 x 3 = 3

    Transport-type: tcp

    Bricks:

    Brick1: gl1:/data/glusterfs/glvol_samba/brick0/brick

    Brick2: gl2:/data/glusterfs/glvol_samba/brick0/brick

    Brick3: gl3:/data/glusterfs/glvol_samba/brick0/brick

    Options Reconfigured:

    cluster.granular-entry-heal: on

    storage.fips-mode-rchecksum: on

    transport.address-family: inet

    nfs.disable: on

    performance.client-io-threads: off

    features.barrier: disable

    root@gl1:~# gluster snapshot list

    snaps_GMT-2023.08.15-13.05.28

    # on node 3:

    root@gl3:~# systemctl stop glusterd.service

    # on node 1:

    root@gl1:~# gluster snapshot deactivate
    snaps_GMT-2023.08.15-13.05.28

    Deactivating snap will make its data inaccessible. Do you want to
    continue? (y/n) y

    Snapshot deactivate: snaps_GMT-2023.08.15-13.05.28: Snap deactivated
    successfully

    root@gl1:~# gluster snapshot delete snaps_GMT-2023.08.15-13.05.28

    Deleting snap will erase all the information about the snap. Do you
    still want to continue? (y/n) y

    snapshot delete: snaps_GMT-2023.08.15-13.05.28: snap removed
    successfully

    root@gl1:~# gluster snapshot list

    No snapshots present

    # on node 3:

    root@gl3:~# systemctl start glusterd.service

    root@gl3:~# gluster snapshot list

    snaps_GMT-2023.08.15-13.05.28

    root@gl3:~# gluster snapshot deactivate
    snaps_GMT-2023.08.15-13.05.28

    Deactivating snap will make its data inaccessible. Do you want to
    continue? (y/n) y

    snapshot deactivate: failed: Pre Validation failed on gl1.ad.arc.de.
    Snapshot (snaps_GMT-2023.08.15-13.05.28) does not exist.

    Pre Validation failed on gl2. Snapshot
    (snaps_GMT-2023.08.15-13.05.28) does not exist.

    Snapshot command failed

    root@gl3:~# lvs -a

      LV                                 VG        Attr       LSize 
    Pool      Origin    Data%  Meta%  Move Log Cpy%Sync Convert

      669cbc14fa7542acafb2995666284583_0 vg_brick0 Vwi-aotz-- 15,00g
    tp_brick0 lv_brick0 0,08

      lv_brick0                          vg_brick0 Vwi-aotz-- 15,00g
    tp_brick0           0,08

      [lvol0_pmspare]                    vg_brick0 ewi------- 20,00m

      tp_brick0                          vg_brick0 twi-aotz--
    18,00g                     0,12   10,57

      [tp_brick0_tdata]                  vg_brick0 Twi-ao---- 18,00g

      [tp_brick0_tmeta]                  vg_brick0 ewi-ao---- 20,00m

    Would it be dangerous to just delete following items on node 3 while
    gluster is down:

    - the orphaned directories in /var/lib/glusterd/snaps/

    - the orphaned lvm, here 669cbc14fa7542acafb2995666284583_0 

    Or is there a self-heal command?

    Regards

    Sebastian

    Am 10.08.2023 um 20:33 schrieb Strahil
      Nikolov:

      I’ve never had such situation and I don’t recall someone sharing
      something similar.

      Most probably
          it’s easier to remove the node from the TSP and re-add it.

        Of course , test the case in VMs just to validate that it’s
          possible to add a mode to a cluster with snapshots.

        I have a vague feeling that you will need to delete all
          snapshots.

          Best Regards,
          Strahil Nikolov 

            On
              Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein
              <sebastian.neustein@xxxxxxxxxxxxx> wrote:

                            Hi

                              Due to an outage of one node, after
                              bringing it up again, the node has some
                              orphaned snapshosts, which are already
                              deleted on the other nodes. 

                              How
                              can I delete these orphaned snapshots?
                              Trying the normal way produceses these
                              errors:

                              [2023-08-08
                                19:34:03.667109 +0000] E [MSGID: 106115]
[glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre
                                Validation failed on B742. Please check
                                log file for details.

                              [2023-08-08
                                19:34:03.667184 +0000] E [MSGID: 106115]
[glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre
                                Validation failed on B741. Please check
                                log file for details.

                              [2023-08-08
                                19:34:03.667210 +0000] E [MSGID: 106121]
[glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management: Pre
                                Validation failed on peers

                              [2023-08-08
                                19:34:03.667236 +0000] E [MSGID: 106121]
[glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases]
                                0-management: Pre Validation Failed

                              Even worse: I followed read hat gluster
                                snapshot trouble guide and deleted
                              one of those directories defining a
                              snapshot. Now I receive this on the cli:

                              run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
                                [2023-08-09 08:59:41.107243 +0000] M
                                [MSGID: 113075]
                                [posix-helpers.c:2161:posix_health_check_thread_proc]
0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, going
                                down

                              run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
                                [2023-08-09 08:59:41.107243 +0000] M
                                [MSGID: 113075]
                                [posix-helpers.c:2161:posix_health_check_thread_proc]
0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, going
                                down

                              run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
                                [2023-08-09 08:59:41.107292 +0000] M
                                [MSGID: 113075]
                                [posix-helpers.c:2179:posix_health_check_thread_proc]
0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM

                              run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
                                [2023-08-09 08:59:41.107292 +0000] M
                                [MSGID: 113075]
                                [posix-helpers.c:2179:posix_health_check_thread_proc]
0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM

                              What are my options? 

                              - is there an easy way to remove all those
                              snapshots?

                              - or would it be easier to remove and
                              rejoin the node to the gluster cluster?

                              Thank you for any help!

                              Seb

                  -- 
Sebastian Neustein

Airport Research Center GmbH
Bismarckstraße 61
52066 Aachen
Germany

Phone: +49 241 16843-23
Fax: +49 241 16843-19
e-mail: sebastian.neustein@xxxxxxxxxxxxx
Website: http://www.airport-consultants.com

Register Court: Amtsgericht Aachen HRB 7313
Ust-Id-No.: DE196450052

Managing Director:
Dipl.-Ing. Tom Alexander Heuer

              ________

              Community Meeting Calendar:

              Schedule -

              Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

              Bridge: https://meet.google.com/cpu-eiue-hvk

              Gluster-users mailing list

              Gluster-users@xxxxxxxxxxx

              https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users