I'm pretty sure I did "watch ... remove-brick ... status" till it mentioned everything is completed before trigger commit, I should make it clear in my previous mail. Actually you can read my mail again - in step #5, files on /sdc1 got migrated instead of /sdd1, even though my command was trying to remove-brick /sdd1, this is the root cause (to me) that caused the problem, as data on /sdc1 migrated to /sdb1 and /sdd1, then commit simply remove /sdd1 from gfs_v0. It seems vol definition information got some problem in gluster. -C.B. On 8/12/2013 9:51 PM, Ravishankar N wrote: > On 08/13/2013 03:43 AM, Cool wrote: >> remove-brick in 3.4.0 seems removing wrong bricks, can someone help >> to review the environment/steps to see if I did anything stupid? >> >> setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages >> from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1: >> ii glusterfs-client 3.4.0final-ubuntu1~precise1 clustered >> file-system (client package) >> ii glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common >> libraries and translator modules >> ii glusterfs-server 3.4.0final-ubuntu1~precise1 clustered >> file-system (server package) >> >> step to reproduce the problem: >> 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1 >> 2. add-brick gfs11:/sdc1 and gfs12:/sdc1 >> 3. add-brick gfs11:/sdd1 and gfs12:/sdd1 >> 4. rebalance to make files distributed to all three pair of disks >> 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on >> ***/sdc1*** are migrating out >> 6. remove-brick commit led to data loss in gfs_v0 >> >> If between step 5 and 6 I initiate a remove-brick targeting /sdc1, >> then after commit I would not lose anything since all data will be >> migrated back to /sdb1. >> > > You should ensure that a 'remove-brick start ' has completed and > then commit it before initiating the second one. The correct way to do > this would be: > 5. # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 start > 6. Check that the data migration has been completed using the status > command: > # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 status > 7. #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 commit > 8. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 start > 9. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 status > 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 commit > > This would leave you with the original replica 2 volume that you had > begun with. Hope this helps. > > Note: > The latest version of glusterfs has the check that prevents a second > remove-brick operation until the first one has been committed. > (You would receive a message thus : "volume remove-brick start: > failed: An earlier remove-brick task exists for volume <volname>. > Either commit it or stop it before starting a new task." ) > > -Ravi > > >> -C.B. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > > >