Fwd: Unable to remove / replace faulty bricks

elvinas.piliponis at barclays.com (elvinas.piliponis at barclays.com) · Thu, 20 Jun 2013 08:14:00 +0100

Hello, 

> All volfiles are autogenerated based on the info available in the other files
> in /var/lib/glusterd/vols/<name>/ (like ./info, ./bricks/*). So to manually fix
> your "situation", please make sure the contents in the files ./info,
> ./node_state.info ./rbstate ./bricks/* are "proper" (you can either share
> them with me offline, or compare them with another volume which is
I will consult with management about sharing. However I have found no obvious differencies in the .vol files. Bricks with faulty servers are explicitly defined in them the same way as proper bricks. 

> good), and issue a "gluster volume reset <volname>" to re-write fresh
> volfiles.
Is this really correct command? According to help it should reset configuration options. 
volume reset <VOLNAME> [option] [force] - reset all the reconfigured options

> It is also a good idea to double check the contents of
> /var/lib/glusterd/peers/* is proper too.
Only dead 00022 server  is missing as peer probe is not able to probe it. 00031 was reattached (but still not recognized as part of volume).

> Doing these manual steps and restarting all processes should recover you
> from pretty much any situation.
Yeah, I thought so. However after any attempt to edit volume refused to start (glusterfs-server failed to start), complaining about uknown keys and listing brick numbers from info file. 

> Back to the cause of the problem - it appears to be the case that the ongoing
> replace-brick got messed up when yet another server died.
I believe it got messed (or messed finally) when I was attempting desperately to remove bricks and issued 
gluster peer detach 00022 force
gluster peer detach 00031 force
Hoping that this would allow  to break migration in progress and then remove/replace those servers. 

> A different way of achieving what you want, is to use add-brick + remove-
> brick for decommissioning servers (i.e, add-brick the new server
> - 00028, and "remove-brick start" the old one - 00031, and "remove-brick
> commit" once all the data has drained out). Moving forward this will be the
> recommended way to decommission servers. Use replace-brick to only
> replace an already dead server - 00022 with its replacement).

I am using distributed - replicated volume so I can only add/remove servers in replica pairs. Also command cluster brick-remove has issues with open files. At least this occurs with semiosis package. Any active KVM virtual instance COW  files (base disk + diff) gets corrupted as soon as the following commands touches the data:
brick-remove 
rebalance
What I have observed is that corruptions occurs even for the base disk file, which is shared in OpenStack by number of instances so single file corruption will cause fault on multiple VMs and recovery can be impossible due to instances will get corrupt data to their diff files and replacing base file with proper one will not help for them. I have tested this several times and found that replace-brick  for some reason is working properly and does not cause issues for openfiles. 

> 
> > I am using Semiosis 3.3.1 package on Ubuntu 12.04:
> > dpkg -l | grep gluster
> > rc  glusterfs                        3.3.0-1
> >         clustered file-system
> > ii  glusterfs-client                 3.3.1-ubuntu1~precise8
> >          clustered file-system (client package)
> > ii  glusterfs-common                 3.3.1-ubuntu1~precise8
> >          GlusterFS common libraries and translator modules
> > ii  glusterfs-server                 3.3.1-ubuntu1~precise8
> >          clustered file-system (server package)

I have attempted to run glusterfs-server in debug mode and  saw the following when I have attempted to replace brick with force. It seems that Gluster is unable to force volume change command if one of the nodes does not respond. Even when force mode is issued. I would expect "force" should ignore such issues, especially when change is not related to the replica set, which node does not responds. 

However in the end I do receive the following error, which does not seem to relate to the log:
brick: 00031:/mnt/vmstore/brick does not exist in volume: glustervmstore

In my case it is
00031 -- 00036 --- I am replacing 00031 with spare 00028
00022 -- 00024 -- 00022 have had disk failure and system is offline and is unable to respond. 

[2013-06-19 09:56:21.520991] D [glusterd-utils.c:941:glusterd_volinfo_find] 0-: Volume glustervmstore found
[2013-06-19 09:56:21.521014] D [glusterd-utils.c:949:glusterd_volinfo_find] 0-: Returning 0
[2013-06-19 09:56:21.521060] D [glusterd-utils.c:727:glusterd_brickinfo_new] 0-: Returning 0
[2013-06-19 09:56:21.521095] D [glusterd-utils.c:783:glusterd_brickinfo_from_brick] 0-: Returning 0
[2013-06-19 09:56:21.521126] D [glusterd-utils.c:585:glusterd_volinfo_new] 0-: Returning 0
[2013-06-19 09:56:21.521170] D [glusterd-utils.c:672:glusterd_volume_brickinfos_delete] 0-: Returning 0
[2013-06-19 09:56:21.521201] D [glusterd-utils.c:701:glusterd_volinfo_delete] 0-: Returning 0
[2013-06-19 09:56:21.521233] D [glusterd-utils.c:727:glusterd_brickinfo_new] 0-: Returning 0
[2013-06-19 09:56:21.521261] D [glusterd-utils.c:783:glusterd_brickinfo_from_brick] 0-: Returning 0
[2013-06-19 09:56:21.521290] D [glusterd-utils.c:585:glusterd_volinfo_new] 0-: Returning 0
[2013-06-19 09:56:21.521322] D [glusterd-utils.c:672:glusterd_volume_brickinfos_delete] 0-: Returning 0
[2013-06-19 09:56:21.521350] D [glusterd-utils.c:701:glusterd_volinfo_delete] 0-: Returning 0
[2013-06-19 09:56:21.521385] D [glusterd-utils.c:4344:glusterd_is_rb_started] 0-: is_rb_started:status=0
[2013-06-19 09:56:21.521417] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: 00031:/mnt/vmstore/brick
[2013-06-19 09:56:21.521457] D [glusterd-utils.c:4115:glusterd_friend_find_by_hostname] 0-management: Friend 0031 found.. state: 3
[2013-06-19 09:56:21.521485] D [glusterd-utils.c:4198:glusterd_hostname_to_uuid] 0-: returning 0

[2013-06-19 09:56:21.524381] D [glusterd-utils.c:4164:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: 00022

[2013-06-19 09:56:21.525286] D [glusterd-utils.c:234:glusterd_is_local_addr] 0-management: 10.x.x.x
[2013-06-19 09:56:21.525346] D [glusterd-utils.c:234:glusterd_is_local_addr] 0-management: 10.x.x.x
[2013-06-19 09:56:21.525371] D [glusterd-utils.c:234:glusterd_is_local_addr] 0-management: 10.x.x.x

[2013-06-19 09:56:21.525392] D [glusterd-utils.c:255:glusterd_is_local_addr] 0-management: 00022 is not local

[2013-06-19 09:56:21.525407] D [glusterd-utils.c:4198:glusterd_hostname_to_uuid] 0-: returning 1
[2013-06-19 09:56:21.525421] D [glusterd-utils.c:739:glusterd_resolve_brick] 0-: Returning 1
[2013-06-19 09:56:21.525434] D [glusterd-utils.c:838:glusterd_volume_brickinfo_get] 0-: Returning -1
[2013-06-19 09:56:21.525447] D [glusterd-utils.c:881:glusterd_volume_brickinfo_get_by_brick] 0-: Returning -1
[2013-06-19 09:56:21.525464] D [glusterd-replace-brick.c:504:glusterd_op_stage_replace_brick] 0-: Returning -1
[2013-06-19 09:56:21.525477] D [glusterd-op-sm.c:2968:glusterd_op_stage_validate] 0-: Returning -1

[2013-06-19 09:56:21.525491] E [glusterd-op-sm.c:1999:glusterd_op_ac_send_stage_op] 0-: Staging failed

[2013-06-19 09:56:21.525507] D [glusterd-op-sm.c:4539:glusterd_op_sm_inject_event] 0-glusterd: Enqueue event: 'GD_OP_EVENT_RCVD_RJT'
[2013-06-19 09:56:21.525521] I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 0 peers
[2013-06-19 09:56:21.525535] D [glusterd-op-sm.c:4539:glusterd_op_sm_inject_event] 0-glusterd: Enqueue event: 'GD_OP_EVENT_ALL_ACC'
[2013-06-19 09:56:21.525549] D [glusterd-op-sm.c:144:glusterd_op_sm_inject_all_acc] 0-: Returning 0
[2013-06-19 09:56:21.525561] D [glusterd-op-sm.c:2044:glusterd_op_ac_send_stage_op] 0-: Returning with 0
[2013-06-19 09:56:21.525575] D [glusterd-utils.c:4719:glusterd_sm_tr_log_transition_add] 0-glusterd: Transitioning from 'Lock sent' to 'Stage op sent' due to event 'GD_OP_EVENT_ALL_ACC'
This e-mail and any attachments are confidential and intended
solely for the addressee and may also be privileged or exempt from
disclosure under applicable law. If you are not the addressee, or
have received this e-mail in error, please notify the sender
immediately, delete it from your system and do not copy, disclose
or otherwise act upon any part of this e-mail or its attachments.

Internet communications are not guaranteed to be secure or
virus-free.
The Barclays Group does not accept responsibility for any loss
arising from unauthorised access to, or interference with, any
Internet communications by any third party, or from the
transmission of any viruses. Replies to this e-mail may be
monitored by the Barclays Group for operational or business
reasons.

Any opinion or other information in this e-mail or its attachments
that does not relate to the business of the Barclays Group is
personal to the sender and is not given or endorsed by the Barclays
Group.

Barclays Bank PLC. Registered in England and Wales (registered no.
1026167).
Registered Office: 1 Churchill Place, London, E14 5HP, United
Kingdom.

Barclays Bank PLC is authorised by the Prudential Regulation
Authority and regulated by the Financial Conduct Authority and the
Prudential Regulation Authority (Financial Services Register No.
122702).