Hi Avra,
On 20 February 2017 at 02:51, Avra Sengupta <asengupt@xxxxxxxxxx> wrote:
There are 4 nodes in a (2+1)x2 setup.
Hi D,
It seems you tried to take a clone of a snapshot, when that snapshot was not activated.
Correct. As per my commands, I then noticed the issue, checked the snapshot's status & activated it. I included this in my command history just to clear up any doubts from the logs.
However in this scenario, the cloned volume should not be in an inconsistent state. I will try to reproduce this and see if it's a bug. Meanwhile could you please answer the following queries:
1. How many nodes were in the cluster.
There are 4 nodes in a (2+1)x2 setup.
s0 replicates to s1, with an arbiter on s2, and s2 replicates to s3, with an arbiter on s0.
2. How many bricks does the snapshot data-bck_GMT-2017.02.09-14.15.43 have?
6 bricks, including the 2 arbiters.
3. Was the snapshot clone command issued from a node which did not have any bricks for the snapshot data-bck_GMT-2017.02.09-14.15.43
All commands were issued from s0. All volumes have bricks on every node in the cluster.
4. I see you tried to delete the new cloned volume. Did the new cloned volume land in this state after failure to create the clone or failure to delete the clone
I noticed there was something wrong as soon as I created the clone. The clone command completed, however I was then unable to do anything with it because the clone didn't exist on s1-s3.
If you want to remove the half baked volume from the cluster please proceed with the following steps.
1. bring down glusterd on all nodes by running the following command on all nodes
$ systemctl stop glusterd.
Verify that the glusterd is down on all nodes by running the following command on all nodes
$ systemctl status glusterd.
2. delete the following repo from all the nodes (whichever nodes it exists)
/var/lib/glusterd/vols/data-teste
The repo only exists on s0, but stoppping glusterd on only s0 & deleting the directory didn't work, the directory was restored as soon as glusterd was restarted. I haven't yet tried stopping glusterd on *all* nodes before doing this, although I'll need to plan for that, as it'll take the entire cluster off the air.
Thanks for the reply,
Doug
Regards,
Avra
On 02/16/2017 08:01 PM, Gambit15 wrote:
Any insights into what may have gone wrong?What do I do?Hey guys,I tried to create a new volume from a cloned snapshot yesterday, however something went wrong during the process & I'm now stuck with the new volume being created on the server I ran the commands on (s0), but not on the rest of the peers. I'm unable to delete this new volume from the server, as it doesn't exist on the peers.
CentOS 7.3.1611Gluster 3.8.8
The command history & extract from etc-glusterfs-glusterd.vol.log are included below.
gluster volume list
gluster snapshot list
gluster snapshot clone data-teste data-bck_GMT-2017.02.09-14.15.43
gluster volume status data-teste
gluster volume delete data-teste
gluster snapshot create teste data
gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
gluster snapshot status
gluster snapshot activate teste_GMT-2017.02.15-12.44.04
gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
[2017-02-15 12:43:21.667403] I [MSGID: 106499] [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: Received status volume req for volume data-teste
[2017-02-15 12:43:21.682530] E [MSGID: 106301] [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : Volume data-teste is not started
[2017-02-15 12:43:43.633031] I [MSGID: 106495] [glusterd-handler.c:3128:__glusterd_handle_getwd] 0-glusterd: Received getwd req
[2017-02-15 12:43:43.640597] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+ 0xcc4b2) [0x7ffb396a14b2] -->/usr/lib64/glusterfs/3.8.8/ xlator/mgmt/glusterd.so(+ 0xcbf65) [0x7ffb396a0f65] -->/lib64/libglusterfs.so.0( runner_log+0x115) [0x7ffb44ec31c5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/ delete/post/S57glusterfind- delete-post --volname=data-teste
[2017-02-15 13:05:20.103423] E [MSGID: 106122] [glusterd-snapshot.c:2397:glusterd_snapshot_clone_ prevalidate] 0-management: Failed to pre validate
[2017-02-15 13:05:20.103464] E [MSGID: 106443] [glusterd-snapshot.c:2413:glusterd_snapshot_clone_ prevalidate] 0-management: One or more bricks are not running. Please run snapshot status command to see brick status.
Please start the stopped brick and then issue snapshot clone command
[2017-02-15 13:05:20.103481] W [MSGID: 106443] [glusterd-snapshot.c:8563:glusterd_snapshot_prevalidate] 0-management: Snapshot clone pre-validation failed
[2017-02-15 13:05:20.103492] W [MSGID: 106122] [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed
[2017-02-15 13:05:20.103503] E [MSGID: 106122] [glusterd-mgmt.c:884:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node
[2017-02-15 13:05:20.103514] E [MSGID: 106122] [glusterd-mgmt.c:2243:glusterd_mgmt_v3_initiate_ snap_phases] 0-management: Pre Validation Failed
[2017-02-15 13:05:20.103531] E [MSGID: 106027] [glusterd-snapshot.c:8118:glusterd_snapshot_clone_ postvalidate] 0-management: unable to find clone data-teste volinfo
[2017-02-15 13:05:20.103542] W [MSGID: 106444] [glusterd-snapshot.c:9063:glusterd_snapshot_ postvalidate] 0-management: Snapshot create post-validation failed
[2017-02-15 13:05:20.103561] W [MSGID: 106121] [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed
[2017-02-15 13:05:20.103572] E [MSGID: 106121] [glusterd-mgmt.c:1660:glusterd_mgmt_v3_post_ validate] 0-management: Post Validation failed for operation Snapshot on local node
[2017-02-15 13:05:20.103582] E [MSGID: 106122] [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_ snap_phases] 0-management: Post Validation Failed
[2017-02-15 13:11:15.862858] W [MSGID: 106057] [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_ find] 0-management: Snap volume c3ceae3889484e96ab8bed69593cf6 d3.s0.run-gluster-snaps- c3ceae3889484e96ab8bed69593cf6 d3-brick1-data-brick not found [Argumento inválido]
[2017-02-15 13:11:16.314759] I [MSGID: 106143] [glusterd-pmap.c:250:pmap_registry_bind] 0-pmap: adding brick /run/gluster/snaps/ c3ceae3889484e96ab8bed69593cf6 d3/brick1/data/brick on port 49452
[2017-02-15 13:11:16.316090] I [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-02-15 13:11:16.348867] W [MSGID: 106057] [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_ find] 0-management: Snap volume c3ceae3889484e96ab8bed69593cf6 d3.s0.run-gluster-snaps- c3ceae3889484e96ab8bed69593cf6 d3-brick6-data-arbiter not found [Argumento inválido]
[2017-02-15 13:11:16.558878] I [MSGID: 106143] [glusterd-pmap.c:250:pmap_registry_bind] 0-pmap: adding brick /run/gluster/snaps/ c3ceae3889484e96ab8bed69593cf6 d3/brick6/data/arbiter on port 49453
[2017-02-15 13:11:16.559883] I [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-02-15 13:11:23.279721] E [MSGID: 106030] [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot] 0-management: taking snapshot of the brick (/run/gluster/snaps/ c3ceae3889484e96ab8bed69593cf6 d3/brick1/data/brick) of device /dev/mapper/v0.dc0.cte--g0- c3ceae3889484e96ab8bed69593cf6 d3_0 failed
[2017-02-15 13:11:23.279790] E [MSGID: 106030] [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot] 0-management: Failed to take snapshot of brick s0:/run/gluster/snaps/ c3ceae3889484e96ab8bed69593cf6 d3/brick1/data/brick
[2017-02-15 13:11:23.279806] E [MSGID: 106030] [glusterd-snapshot.c:6484:glusterd_take_brick_snapshot_ task] 0-management: Failed to take backend snapshot for brick s0:/run/gluster/snaps/data- teste/brick1/data/brick volume(data-teste)
[2017-02-15 13:11:23.286678] E [MSGID: 106030] [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot] 0-management: taking snapshot of the brick (/run/gluster/snaps/ c3ceae3889484e96ab8bed69593cf6 d3/brick6/data/arbiter) of device /dev/mapper/v0.dc0.cte--g0- c3ceae3889484e96ab8bed69593cf6 d3_1 failed
[2017-02-15 13:11:23.286735] E [MSGID: 106030] [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot] 0-management: Failed to take snapshot of brick s0:/run/gluster/snaps/ c3ceae3889484e96ab8bed69593cf6 d3/brick6/data/arbiter
[2017-02-15 13:11:23.286749] E [MSGID: 106030] [glusterd-snapshot.c:6484:glusterd_take_brick_snapshot_ task] 0-management: Failed to take backend snapshot for brick s0:/run/gluster/snaps/data- teste/brick6/data/arbiter volume(data-teste)
[2017-02-15 13:11:23.286793] E [MSGID: 106030] [glusterd-snapshot.c:6626:glusterd_schedule_brick_ snapshot] 0-management: Failed to create snapshot
[2017-02-15 13:11:23.286813] E [MSGID: 106441] [glusterd-snapshot.c:6796:glusterd_snapshot_clone_ commit] 0-management: Failed to take backend snapshot data-teste
[2017-02-15 13:11:25.530666] E [MSGID: 106442] [glusterd-snapshot.c:8308:glusterd_snapshot] 0-management: Failed to clone snapshot
[2017-02-15 13:11:25.530721] W [MSGID: 106123] [glusterd-mgmt.c:272:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed
[2017-02-15 13:11:25.530735] E [MSGID: 106123] [glusterd-mgmt.c:1427:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Snapshot on local node
[2017-02-15 13:11:25.530749] E [MSGID: 106123] [glusterd-mgmt.c:2304:glusterd_mgmt_v3_initiate_ snap_phases] 0-management: Commit Op Failed
[2017-02-15 13:11:25.532312] E [MSGID: 106027] [glusterd-snapshot.c:8118:glusterd_snapshot_clone_ postvalidate] 0-management: unable to find clone data-teste volinfo
[2017-02-15 13:11:25.532339] W [MSGID: 106444] [glusterd-snapshot.c:9063:glusterd_snapshot_ postvalidate] 0-management: Snapshot create post-validation failed
[2017-02-15 13:11:25.532353] W [MSGID: 106121] [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed
[2017-02-15 13:11:25.532367] E [MSGID: 106121] [glusterd-mgmt.c:1660:glusterd_mgmt_v3_post_ validate] 0-management: Post Validation failed for operation Snapshot on local node
[2017-02-15 13:11:25.532381] E [MSGID: 106122] [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_ snap_phases] 0-management: Post Validation Failed
[2017-02-15 13:29:53.779020] E [MSGID: 106062] [glusterd-snapshot-utils.c:2391:glusterd_snap_create_use_ rsp_dict] 0-management: failed to get snap UUID
[2017-02-15 13:29:53.779073] E [MSGID: 106099] [glusterd-snapshot-utils.c:2507:glusterd_snap_use_rsp_ dict] 0-glusterd: Unable to use rsp dict
[2017-02-15 13:29:53.779096] E [MSGID: 106108] [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn] 0-management: Failed to aggregate response from node/brick
[2017-02-15 13:29:53.779136] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit failed on s3. Please check log file for details.
[2017-02-15 13:29:54.136196] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit failed on s1. Please check log file for details.
The message "E [MSGID: 106108] [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn] 0-management: Failed to aggregate response from node/brick" repeated 2 times between [2017-02-15 13:29:53.779096] and [2017-02-15 13:29:54.535080]
[2017-02-15 13:29:54.535098] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit failed on s2. Please check log file for details.
[2017-02-15 13:29:54.535320] E [MSGID: 106123] [glusterd-mgmt.c:1490:glusterd_mgmt_v3_commit] 0-management: Commit failed on peers
[2017-02-15 13:29:54.535370] E [MSGID: 106123] [glusterd-mgmt.c:2304:glusterd_mgmt_v3_initiate_ snap_phases] 0-management: Commit Op Failed
[2017-02-15 13:29:54.539708] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post Validation failed on s1. Please check log file for details.
[2017-02-15 13:29:54.539797] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post Validation failed on s3. Please check log file for details.
[2017-02-15 13:29:54.539856] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post Validation failed on s2. Please check log file for details.
[2017-02-15 13:29:54.540224] E [MSGID: 106121] [glusterd-mgmt.c:1713:glusterd_mgmt_v3_post_ validate] 0-management: Post Validation failed on peers
[2017-02-15 13:29:54.540256] E [MSGID: 106122] [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_ snap_phases] 0-management: Post Validation Failed
The message "E [MSGID: 106062] [glusterd-snapshot-utils.c:2391:glusterd_snap_create_use_ rsp_dict] 0-management: failed to get snap UUID" repeated 2 times between [2017-02-15 13:29:53.779020] and [2017-02-15 13:29:54.535075]
The message "E [MSGID: 106099] [glusterd-snapshot-utils.c:2507:glusterd_snap_use_rsp_ dict] 0-glusterd: Unable to use rsp dict" repeated 2 times between [2017-02-15 13:29:53.779073] and [2017-02-15 13:29:54.535078]
[2017-02-15 13:31:14.285666] I [MSGID: 106488] [glusterd-handler.c:1537:__glusterd_handle_cli_get_ volume] 0-management: Received get vol req
[2017-02-15 13:32:17.827422] E [MSGID: 106027] [glusterd-handler.c:4670:glusterd_get_volume_opts] 0-management: Volume cluster.locking-scheme does not exist
[2017-02-15 13:34:02.635762] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on s1. Volume data-teste does not exist
[2017-02-15 13:34:02.635838] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on s2. Volume data-teste does not exist
[2017-02-15 13:34:02.635889] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on s3. Volume data-teste does not exist
[2017-02-15 13:34:02.636092] E [MSGID: 106122] [glusterd-mgmt.c:947:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed on peers
[2017-02-15 13:34:02.636132] E [MSGID: 106122] [glusterd-mgmt.c:2009:glusterd_mgmt_v3_initiate_all_ phases] 0-management: Pre Validation Failed
[2017-02-15 13:34:20.313228] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on s2. Error: Volume data-teste does not exist
[2017-02-15 13:34:20.313320] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on s1. Error: Volume data-teste does not exist
[2017-02-15 13:34:20.313377] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on s3. Error: Volume data-teste does not exist
[2017-02-15 13:34:36.796455] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on s1. Error: Volume data-teste does not exist
[2017-02-15 13:34:36.796830] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on s3. Error: Volume data-teste does not exist
[2017-02-15 13:34:36.796896] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on s2. Error: Volume data-teste does not exist
Many thanks!
D
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/ mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users