Re: Failed snapshot clone leaving undeletable orphaned volume on a single peer

Gambit15 <dougti+gluster@xxxxxxxxx> · Mon, 20 Feb 2017 09:21:40 -0400

Hi Avra,

On 20 February 2017 at 02:51, Avra Sengupta <asengupt@xxxxxxxxxx> wrote:

    Hi D,

      It seems you tried to take a clone of a snapshot, when that
      snapshot was not activated.

Correct. As per my commands, I then noticed the issue, checked the snapshot's status & activated it. I included this in my command history just to clear up any doubts from the logs.

      However in this scenario, the cloned volume should not be in an
      inconsistent state. I will try to reproduce this and see if it's a
      bug. Meanwhile could you please answer the following queries:

      1. How many nodes were in the cluster.

There are 4 nodes in a (2+1)x2 setup.
s0 replicates to s1, with an arbiter on s2, and s2 replicates to s3, with an arbiter on s0.

      2. How many bricks does the snapshot
      data-bck_GMT-2017.02.09-14.15.43 have?

6 bricks, including the 2 arbiters.

      3. Was the snapshot clone command issued from a node which did not
      have any bricks for the snapshot data-bck_GMT-2017.02.09-14.15.43

All commands were issued from s0. All volumes have bricks on every node in the cluster.

      4. I see you tried to delete the new cloned volume. Did the new
      cloned volume land in this state after failure to create the clone
      or failure to delete the clone

I noticed there was something wrong as soon as I created the clone. The clone command completed, however I was then unable to do anything with it because the clone didn't exist on s1-s3.

      If you want to remove the half baked volume from the cluster
      please proceed with the following steps.

      1. bring down glusterd on all nodes by running the following
      command on all nodes

      $ systemctl stop glusterd.

      Verify that the glusterd is down on all nodes by running the
      following command on all nodes

      $ systemctl status glusterd.

      2. delete the following repo from all the nodes (whichever nodes
      it exists)

      /var/lib/glusterd/vols/data-teste

The repo only exists on s0, but stoppping glusterd on only s0 & deleting the directory didn't work, the directory was restored as soon as glusterd was restarted. I haven't yet tried stopping glusterd on *all* nodes before doing this, although I'll need to plan for that, as it'll take the entire cluster off the air.

Thanks for the reply,
 Doug

      Regards,

      Avra

      On 02/16/2017 08:01 PM, Gambit15 wrote:

            Hey guys,

             I tried to create a new volume from a cloned snapshot
            yesterday, however something went wrong during the process
            & I'm now stuck with the new volume being created on the
            server I ran the commands on (s0), but not on the rest of
            the peers. I'm unable to delete this new volume from the
            server, as it doesn't exist on the peers.

          What do I do?

        Any insights into what may have gone wrong?

                    CentOS 7.3.1611
                  Gluster 3.8.8

                  The command history & extract from
                    etc-glusterfs-glusterd.vol.log are included below.

                    gluster volume list

                    gluster snapshot list

                    gluster snapshot clone data-teste
                    data-bck_GMT-2017.02.09-14.15.43

                    gluster volume status data-teste

                    gluster volume delete data-teste

                    gluster snapshot create teste data

                    gluster snapshot clone data-teste
                    teste_GMT-2017.02.15-12.44.04

                    gluster snapshot status

                    gluster snapshot activate
                    teste_GMT-2017.02.15-12.44.04

                    gluster snapshot clone data-teste
                    teste_GMT-2017.02.15-12.44.04

                    [2017-02-15 12:43:21.667403] I [MSGID: 106499]
                    [glusterd-handler.c:4349:__glusterd_handle_status_volume]
                    0-management: Received status volume req for volume
                    data-teste

                    [2017-02-15 12:43:21.682530] E [MSGID: 106301]
                    [glusterd-syncop.c:1297:gd_stage_op_phase]
                    0-management: Staging of operation 'Volume Status'
                    failed on localhost : Volume data-teste is not
                    started

                    [2017-02-15 12:43:43.633031] I [MSGID: 106495]
                    [glusterd-handler.c:3128:__glusterd_handle_getwd]
                    0-glusterd: Received getwd req

                    [2017-02-15 12:43:43.640597] I
                    [run.c:191:runner_log]
                    (-->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcc4b2)
                    [0x7ffb396a14b2]
                    -->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcbf65)
                    [0x7ffb396a0f65]
                    -->/lib64/libglusterfs.so.0(runner_log+0x115)
                    [0x7ffb44ec31c5] ) 0-management: Ran script:
                    /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post
                    --volname=data-teste

                    [2017-02-15 13:05:20.103423] E [MSGID: 106122]
                    [glusterd-snapshot.c:2397:glusterd_snapshot_clone_prevalidate]
                    0-management: Failed to pre validate

                    [2017-02-15 13:05:20.103464] E [MSGID: 106443]
                    [glusterd-snapshot.c:2413:glusterd_snapshot_clone_prevalidate]
                    0-management: One or more bricks are not running.
                    Please run snapshot status command to see brick
                    status.

                    Please start the stopped brick and then issue
                    snapshot clone command

                    [2017-02-15 13:05:20.103481] W [MSGID: 106443]
                    [glusterd-snapshot.c:8563:glusterd_snapshot_prevalidate]
                    0-management: Snapshot clone pre-validation failed

                    [2017-02-15 13:05:20.103492] W [MSGID: 106122]
                    [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn]
                    0-management: Snapshot Prevalidate Failed

                    [2017-02-15 13:05:20.103503] E [MSGID: 106122]
                    [glusterd-mgmt.c:884:glusterd_mgmt_v3_pre_validate]
                    0-management: Pre Validation failed for operation
                    Snapshot on local node

                    [2017-02-15 13:05:20.103514] E [MSGID: 106122]
                    [glusterd-mgmt.c:2243:glusterd_mgmt_v3_initiate_snap_phases]
                    0-management: Pre Validation Failed

                    [2017-02-15 13:05:20.103531] E [MSGID: 106027]
                    [glusterd-snapshot.c:8118:glusterd_snapshot_clone_postvalidate]
                    0-management: unable to find clone data-teste
                    volinfo

                    [2017-02-15 13:05:20.103542] W [MSGID: 106444]
                    [glusterd-snapshot.c:9063:glusterd_snapshot_postvalidate]
                    0-management: Snapshot create post-validation failed

                    [2017-02-15 13:05:20.103561] W [MSGID: 106121]
                    [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn]
                    0-management: postvalidate operation failed

                    [2017-02-15 13:05:20.103572] E [MSGID: 106121]
                    [glusterd-mgmt.c:1660:glusterd_mgmt_v3_post_validate]
                    0-management: Post Validation failed for operation
                    Snapshot on local node

                    [2017-02-15 13:05:20.103582] E [MSGID: 106122]
                    [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases]
                    0-management: Post Validation Failed

                    [2017-02-15 13:11:15.862858] W [MSGID: 106057]
                    [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_find]
                    0-management: Snap volume
                    c3ceae3889484e96ab8bed69593cf6d3.s0.run-gluster-snaps-c3ceae3889484e96ab8bed69593cf6d3-brick1-data-brick
                    not found [Argumento inválido]

                    [2017-02-15 13:11:16.314759] I [MSGID: 106143]
                    [glusterd-pmap.c:250:pmap_registry_bind] 0-pmap:
                    adding brick
                    /run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick
                    on port 49452

                    [2017-02-15 13:11:16.316090] I
                    [rpc-clnt.c:1046:rpc_clnt_connection_init]
                    0-management: setting frame-timeout to 600

                    [2017-02-15 13:11:16.348867] W [MSGID: 106057]
                    [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_find]
                    0-management: Snap volume
                    c3ceae3889484e96ab8bed69593cf6d3.s0.run-gluster-snaps-c3ceae3889484e96ab8bed69593cf6d3-brick6-data-arbiter
                    not found [Argumento inválido]

                    [2017-02-15 13:11:16.558878] I [MSGID: 106143]
                    [glusterd-pmap.c:250:pmap_registry_bind] 0-pmap:
                    adding brick
                    /run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter
                    on port 49453

                    [2017-02-15 13:11:16.559883] I
                    [rpc-clnt.c:1046:rpc_clnt_connection_init]
                    0-management: setting frame-timeout to 600

                    [2017-02-15 13:11:23.279721] E [MSGID: 106030]
                    [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot]
                    0-management: taking snapshot of the brick
                    (/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick)
                    of device
                    /dev/mapper/v0.dc0.cte--g0-c3ceae3889484e96ab8bed69593cf6d3_0
                    failed

                    [2017-02-15 13:11:23.279790] E [MSGID: 106030]
                    [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot]
                    0-management: Failed to take snapshot of brick
                    s0:/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick

                    [2017-02-15 13:11:23.279806] E [MSGID: 106030]
                    [glusterd-snapshot.c:6484:glusterd_take_brick_snapshot_task]
                    0-management: Failed to take backend snapshot for
                    brick
                    s0:/run/gluster/snaps/data-teste/brick1/data/brick
                    volume(data-teste)

                    [2017-02-15 13:11:23.286678] E [MSGID: 106030]
                    [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot]
                    0-management: taking snapshot of the brick
                    (/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter)
                    of device
                    /dev/mapper/v0.dc0.cte--g0-c3ceae3889484e96ab8bed69593cf6d3_1
                    failed

                    [2017-02-15 13:11:23.286735] E [MSGID: 106030]
                    [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot]
                    0-management: Failed to take snapshot of brick
s0:/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter

                    [2017-02-15 13:11:23.286749] E [MSGID: 106030]
                    [glusterd-snapshot.c:6484:glusterd_take_brick_snapshot_task]
                    0-management: Failed to take backend snapshot for
                    brick
                    s0:/run/gluster/snaps/data-teste/brick6/data/arbiter
                    volume(data-teste)

                    [2017-02-15 13:11:23.286793] E [MSGID: 106030]
                    [glusterd-snapshot.c:6626:glusterd_schedule_brick_snapshot]
                    0-management: Failed to create snapshot

                    [2017-02-15 13:11:23.286813] E [MSGID: 106441]
                    [glusterd-snapshot.c:6796:glusterd_snapshot_clone_commit]
                    0-management: Failed to take backend snapshot
                    data-teste

                    [2017-02-15 13:11:25.530666] E [MSGID: 106442]
                    [glusterd-snapshot.c:8308:glusterd_snapshot]
                    0-management: Failed to clone snapshot

                    [2017-02-15 13:11:25.530721] W [MSGID: 106123]
                    [glusterd-mgmt.c:272:gd_mgmt_v3_commit_fn]
                    0-management: Snapshot Commit Failed

                    [2017-02-15 13:11:25.530735] E [MSGID: 106123]
                    [glusterd-mgmt.c:1427:glusterd_mgmt_v3_commit]
                    0-management: Commit failed for operation Snapshot
                    on local node

                    [2017-02-15 13:11:25.530749] E [MSGID: 106123]
                    [glusterd-mgmt.c:2304:glusterd_mgmt_v3_initiate_snap_phases]
                    0-management: Commit Op Failed

                    [2017-02-15 13:11:25.532312] E [MSGID: 106027]
                    [glusterd-snapshot.c:8118:glusterd_snapshot_clone_postvalidate]
                    0-management: unable to find clone data-teste
                    volinfo

                    [2017-02-15 13:11:25.532339] W [MSGID: 106444]
                    [glusterd-snapshot.c:9063:glusterd_snapshot_postvalidate]
                    0-management: Snapshot create post-validation failed

                    [2017-02-15 13:11:25.532353] W [MSGID: 106121]
                    [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn]
                    0-management: postvalidate operation failed

                    [2017-02-15 13:11:25.532367] E [MSGID: 106121]
                    [glusterd-mgmt.c:1660:glusterd_mgmt_v3_post_validate]
                    0-management: Post Validation failed for operation
                    Snapshot on local node

                    [2017-02-15 13:11:25.532381] E [MSGID: 106122]
                    [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases]
                    0-management: Post Validation Failed

                    [2017-02-15 13:29:53.779020] E [MSGID: 106062]
                    [glusterd-snapshot-utils.c:2391:glusterd_snap_create_use_rsp_dict]
                    0-management: failed to get snap UUID

                    [2017-02-15 13:29:53.779073] E [MSGID: 106099]
                    [glusterd-snapshot-utils.c:2507:glusterd_snap_use_rsp_dict]
                    0-glusterd: Unable to use rsp dict

                    [2017-02-15 13:29:53.779096] E [MSGID: 106108]
                    [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn]
                    0-management: Failed to aggregate response from 
                    node/brick

                    [2017-02-15 13:29:53.779136] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Commit failed on s3. Please check log
                    file for details.

                    [2017-02-15 13:29:54.136196] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Commit failed on s1. Please check log
                    file for details.

                    The message "E [MSGID: 106108]
                    [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn]
                    0-management: Failed to aggregate response from 
                    node/brick" repeated 2 times between [2017-02-15
                    13:29:53.779096] and [2017-02-15 13:29:54.535080]

                    [2017-02-15 13:29:54.535098] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Commit failed on s2. Please check log
                    file for details.

                    [2017-02-15 13:29:54.535320] E [MSGID: 106123]
                    [glusterd-mgmt.c:1490:glusterd_mgmt_v3_commit]
                    0-management: Commit failed on peers

                    [2017-02-15 13:29:54.535370] E [MSGID: 106123]
                    [glusterd-mgmt.c:2304:glusterd_mgmt_v3_initiate_snap_phases]
                    0-management: Commit Op Failed

                    [2017-02-15 13:29:54.539708] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Post Validation failed on s1. Please
                    check log file for details.

                    [2017-02-15 13:29:54.539797] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Post Validation failed on s3. Please
                    check log file for details.

                    [2017-02-15 13:29:54.539856] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Post Validation failed on s2. Please
                    check log file for details.

                    [2017-02-15 13:29:54.540224] E [MSGID: 106121]
                    [glusterd-mgmt.c:1713:glusterd_mgmt_v3_post_validate]
                    0-management: Post Validation failed on peers

                    [2017-02-15 13:29:54.540256] E [MSGID: 106122]
                    [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases]
                    0-management: Post Validation Failed

                    The message "E [MSGID: 106062]
                    [glusterd-snapshot-utils.c:2391:glusterd_snap_create_use_rsp_dict]
                    0-management: failed to get snap UUID" repeated 2
                    times between [2017-02-15 13:29:53.779020] and
                    [2017-02-15 13:29:54.535075]

                    The message "E [MSGID: 106099]
                    [glusterd-snapshot-utils.c:2507:glusterd_snap_use_rsp_dict]
                    0-glusterd: Unable to use rsp dict" repeated 2 times
                    between [2017-02-15 13:29:53.779073] and [2017-02-15
                    13:29:54.535078]

                    [2017-02-15 13:31:14.285666] I [MSGID: 106488]
                    [glusterd-handler.c:1537:__glusterd_handle_cli_get_volume]
                    0-management: Received get vol req

                    [2017-02-15 13:32:17.827422] E [MSGID: 106027]
                    [glusterd-handler.c:4670:glusterd_get_volume_opts]
                    0-management: Volume cluster.locking-scheme does not
                    exist

                    [2017-02-15 13:34:02.635762] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Pre Validation failed on s1. Volume
                    data-teste does not exist

                    [2017-02-15 13:34:02.635838] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Pre Validation failed on s2. Volume
                    data-teste does not exist

                    [2017-02-15 13:34:02.635889] E [MSGID: 106116]
                    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors]
                    0-management: Pre Validation failed on s3. Volume
                    data-teste does not exist

                    [2017-02-15 13:34:02.636092] E [MSGID: 106122]
                    [glusterd-mgmt.c:947:glusterd_mgmt_v3_pre_validate]
                    0-management: Pre Validation failed on peers

                    [2017-02-15 13:34:02.636132] E [MSGID: 106122]
                    [glusterd-mgmt.c:2009:glusterd_mgmt_v3_initiate_all_phases]
                    0-management: Pre Validation Failed

                    [2017-02-15 13:34:20.313228] E [MSGID: 106153]
                    [glusterd-syncop.c:113:gd_collate_errors]
                    0-glusterd: Staging failed on s2. Error: Volume
                    data-teste does not exist

                    [2017-02-15 13:34:20.313320] E [MSGID: 106153]
                    [glusterd-syncop.c:113:gd_collate_errors]
                    0-glusterd: Staging failed on s1. Error: Volume
                    data-teste does not exist

                    [2017-02-15 13:34:20.313377] E [MSGID: 106153]
                    [glusterd-syncop.c:113:gd_collate_errors]
                    0-glusterd: Staging failed on s3. Error: Volume
                    data-teste does not exist

                    [2017-02-15 13:34:36.796455] E [MSGID: 106153]
                    [glusterd-syncop.c:113:gd_collate_errors]
                    0-glusterd: Staging failed on s1. Error: Volume
                    data-teste does not exist

                    [2017-02-15 13:34:36.796830] E [MSGID: 106153]
                    [glusterd-syncop.c:113:gd_collate_errors]
                    0-glusterd: Staging failed on s3. Error: Volume
                    data-teste does not exist

                    [2017-02-15 13:34:36.796896] E [MSGID: 106153]
                    [glusterd-syncop.c:113:gd_collate_errors]
                    0-glusterd: Staging failed on s2. Error: Volume
                    data-teste does not exist

                  Many thanks!

                   D

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users