On 08/13/2013 06:21 PM, Cool wrote: > I'm pretty sure I did "watch ... remove-brick ... status" till it > mentioned everything is completed before trigger commit, I should make > it clear in my previous mail. > > Actually you can read my mail again - in step #5, files on /sdc1 got > migrated instead of /sdd1, even though my command was trying to > remove-brick /sdd1, Ah, my bad. Got it now. This is strange.. > this is the root cause (to me) that caused the problem, as data on > /sdc1 migrated to /sdb1 and /sdd1, then commit simply remove /sdd1 > from gfs_v0. It seems vol definition information got some problem in > gluster. If you are able to reproduce the issue, does 'gluster volume info' show the correct bricks before and after start-status-commit operations of removing sdd1? You could also see if there are any error messages in /var/log/glusterfs/<volname>-rebalance.log -Ravi > > -C.B. > > On 8/12/2013 9:51 PM, Ravishankar N wrote: >> On 08/13/2013 03:43 AM, Cool wrote: >>> remove-brick in 3.4.0 seems removing wrong bricks, can someone help >>> to review the environment/steps to see if I did anything stupid? >>> >>> setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages >>> from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1: >>> ii glusterfs-client 3.4.0final-ubuntu1~precise1 clustered >>> file-system (client package) >>> ii glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common >>> libraries and translator modules >>> ii glusterfs-server 3.4.0final-ubuntu1~precise1 clustered >>> file-system (server package) >>> >>> step to reproduce the problem: >>> 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1 >>> 2. add-brick gfs11:/sdc1 and gfs12:/sdc1 >>> 3. add-brick gfs11:/sdd1 and gfs12:/sdd1 >>> 4. rebalance to make files distributed to all three pair of disks >>> 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on >>> ***/sdc1*** are migrating out >>> 6. remove-brick commit led to data loss in gfs_v0 >>> >>> If between step 5 and 6 I initiate a remove-brick targeting /sdc1, >>> then after commit I would not lose anything since all data will be >>> migrated back to /sdb1. >>> >> >> You should ensure that a 'remove-brick start ' has completed and >> then commit it before initiating the second one. The correct way to >> do this would be: >> 5. # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 start >> 6. Check that the data migration has been completed using the status >> command: >> # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 >> status >> 7. #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 commit >> 8. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 start >> 9. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 status >> 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 commit >> >> This would leave you with the original replica 2 volume that you had >> begun with. Hope this helps. >> >> Note: >> The latest version of glusterfs has the check that prevents a second >> remove-brick operation until the first one has been committed. >> (You would receive a message thus : "volume remove-brick start: >> failed: An earlier remove-brick task exists for volume <volname>. >> Either commit it or stop it before starting a new task." ) >> >> -Ravi >> >> >>> -C.B. >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> >> >