On 08/15/2013 11:11 AM, Cool wrote: > I'm gonna stop debugging this as I still cannot figure out how to > reproduce this problem for further debug. I did 4~5 rounds of tests > (all from scratch) yesterday and today, only met the problem once > Monday afternoon, but repeating the steps didn't give me the same > result. Also checked log there was nothing wrong except rebalance was > happening on the wrong bricks. > > I will raise this again if I can have any useful information. > > -C.B. > Sure C.B, thanks for your efforts. > On 8/13/2013 7:00 AM, Cool wrote: >> Thanks Ravi, I manged to reproduce the issue for 2 times in the past >> several days, but without anything significant in log, volume info >> and after shows correct information (i.e. sdd1 got removed though >> data was not migrated out), rebalance.log telling it was migrating >> data out of sdc1, not sdd1. >> >> I'm doing another try now with -L TRACE to see if I can get more log >> information, this will take some time, will post here if I find >> anything helpful. >> >> -C.B. >> On 8/13/2013 6:49 AM, Ravishankar N wrote: >>> On 08/13/2013 06:21 PM, Cool wrote: >>>> I'm pretty sure I did "watch ... remove-brick ... status" till it >>>> mentioned everything is completed before trigger commit, I should >>>> make it clear in my previous mail. >>>> >>>> Actually you can read my mail again - in step #5, files on /sdc1 >>>> got migrated instead of /sdd1, even though my command was trying to >>>> remove-brick /sdd1, >>> Ah, my bad. Got it now. This is strange.. >>>> this is the root cause (to me) that caused the problem, as data on >>>> /sdc1 migrated to /sdb1 and /sdd1, then commit simply remove /sdd1 >>>> from gfs_v0. It seems vol definition information got some problem >>>> in gluster. >>> If you are able to reproduce the issue, does 'gluster volume info' >>> show the correct bricks before and after start-status-commit >>> operations of removing sdd1? You could also see if there are any >>> error messages in /var/log/glusterfs/<volname>-rebalance.log >>> >>> -Ravi >>>> >>>> -C.B. >>>> >>>> On 8/12/2013 9:51 PM, Ravishankar N wrote: >>>>> On 08/13/2013 03:43 AM, Cool wrote: >>>>>> remove-brick in 3.4.0 seems removing wrong bricks, can someone >>>>>> help to review the environment/steps to see if I did anything >>>>>> stupid? >>>>>> >>>>>> setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following >>>>>> packages from ppa, both nodes have 3 xfs partitions sdb1, sdc1, >>>>>> sdd1: >>>>>> ii glusterfs-client 3.4.0final-ubuntu1~precise1 clustered >>>>>> file-system (client package) >>>>>> ii glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common >>>>>> libraries and translator modules >>>>>> ii glusterfs-server 3.4.0final-ubuntu1~precise1 clustered >>>>>> file-system (server package) >>>>>> >>>>>> step to reproduce the problem: >>>>>> 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and >>>>>> gfs12:/sdb1 >>>>>> 2. add-brick gfs11:/sdc1 and gfs12:/sdc1 >>>>>> 3. add-brick gfs11:/sdd1 and gfs12:/sdd1 >>>>>> 4. rebalance to make files distributed to all three pair of disks >>>>>> 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on >>>>>> ***/sdc1*** are migrating out >>>>>> 6. remove-brick commit led to data loss in gfs_v0 >>>>>> >>>>>> If between step 5 and 6 I initiate a remove-brick targeting >>>>>> /sdc1, then after commit I would not lose anything since all data >>>>>> will be migrated back to /sdb1. >>>>>> >>>>> >>>>> You should ensure that a 'remove-brick start ' has completed and >>>>> then commit it before initiating the second one. The correct way >>>>> to do this would be: >>>>> 5. # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 >>>>> start >>>>> 6. Check that the data migration has been completed using the >>>>> status command: >>>>> # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 >>>>> status >>>>> 7. #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 >>>>> commit >>>>> 8. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 >>>>> start >>>>> 9. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 >>>>> status >>>>> 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 >>>>> commit >>>>> >>>>> This would leave you with the original replica 2 volume that you >>>>> had begun with. Hope this helps. >>>>> >>>>> Note: >>>>> The latest version of glusterfs has the check that prevents a >>>>> second remove-brick operation until the first one has been committed. >>>>> (You would receive a message thus : "volume remove-brick start: >>>>> failed: An earlier remove-brick task exists for volume <volname>. >>>>> Either commit it or stop it before starting a new task." ) >>>>> >>>>> -Ravi >>>>> >>>>> >>>>>> -C.B. >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> >