remove-brick removed unexpected bricks

ravishankar at redhat.com (Ravishankar N) · Tue, 13 Aug 2013 19:19:18 +0530



On 08/13/2013 06:21 PM, Cool wrote:
> I'm pretty sure I did "watch ... remove-brick ... status" till it 
> mentioned everything is completed before trigger commit, I should make 
> it clear in my previous mail.
>
> Actually you can read my mail again - in step #5, files on /sdc1 got 
> migrated instead of /sdd1, even though my command was trying to 
> remove-brick /sdd1, 
Ah, my bad. Got it now. This is strange..
> this is the root cause (to me) that caused the problem, as data on 
> /sdc1 migrated to /sdb1 and /sdd1, then commit simply remove /sdd1 
> from gfs_v0. It seems vol definition information got some problem in 
> gluster.
If you are able to reproduce the issue, does 'gluster volume info' show 
the correct bricks before and after start-status-commit operations of 
removing sdd1? You could also see if there are any error messages in 
/var/log/glusterfs/<volname>-rebalance.log

-Ravi
>
> -C.B.
>
> On 8/12/2013 9:51 PM, Ravishankar N wrote:
>> On 08/13/2013 03:43 AM, Cool wrote:
>>> remove-brick in 3.4.0 seems removing wrong bricks, can someone help 
>>> to review the environment/steps to see if I did anything stupid?
>>>
>>> setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages 
>>> from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1:
>>> ii  glusterfs-client 3.4.0final-ubuntu1~precise1 clustered 
>>> file-system (client package)
>>> ii  glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common 
>>> libraries and translator modules
>>> ii  glusterfs-server 3.4.0final-ubuntu1~precise1 clustered 
>>> file-system (server package)
>>>
>>> step to reproduce the problem:
>>> 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1
>>> 2. add-brick gfs11:/sdc1 and gfs12:/sdc1
>>> 3. add-brick gfs11:/sdd1 and gfs12:/sdd1
>>> 4. rebalance to make files distributed to all three pair of disks
>>> 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on 
>>> ***/sdc1*** are migrating out
>>> 6. remove-brick commit led to data loss in gfs_v0
>>>
>>> If between step 5 and 6 I initiate a remove-brick targeting /sdc1, 
>>> then after commit I would not lose anything since all data will be 
>>> migrated back to /sdb1.
>>>
>>
>> You should ensure  that a 'remove-brick  start ' has completed and 
>> then commit it before initiating the second one. The correct way to 
>> do this would be:
>> 5.   # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 start
>> 6. Check that the data migration has been completed using the status 
>> command:
>>       # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 
>> status
>> 7.   #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 commit
>> 8.   # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 start
>> 9.   # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 status
>> 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 commit
>>
>> This would leave you with the original replica 2 volume that you had 
>> begun with. Hope this helps.
>>
>> Note:
>> The latest version of glusterfs has the check that prevents a second 
>> remove-brick operation until the first one has been committed.
>> (You would receive a message thus : "volume remove-brick start: 
>> failed: An earlier remove-brick task exists for volume <volname>. 
>> Either commit it or stop it before starting a new task." )
>>
>> -Ravi
>>
>>
>>> -C.B.
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>