Thanks for your replies. My first question is: can I safely issue a "commit" when the volume does not seem to have drained? One of the other arrays failed completely near the end of the draining, so I'm guaranteed some data loss in any event; I just don't want to put the system in an unstable state. I have questions about that too, but I'll save those for another message. Thanks, James Bellinger > On 09/17/2013 03:26 AM, james.bellinger at icecube.wisc.edu wrote: >> I inherited a system with a wide mix of array sizes (no replication) in >> 3.2.2, and wanted to drain data from a failing array. >> >> I upgraded to 3.3.2, and began a >> gluster volume remove-brick scratch "gfs-node01:/sda" start >> >> After some time I got this: >> gluster volume remove-brick scratch "gfs-node01:/sda" status >> Node Rebalanced-files size scanned failures >> status >> --------- ----------- ----------- ----------- ----------- >> ------------ >> localhost 0 0Bytes 0 0 >> not started >> gfs-node06 0 0Bytes 0 0 >> not started >> gfs-node03 0 0Bytes 0 0 >> not started >> gfs-node05 0 0Bytes 0 0 >> not started >> gfs-node01 2257394624 2.8TB 5161640 208878 >> completed >> >> Two things jump instantly to mind: >> 1) The number of failures is rather large > Can you see the rebalance logs (/var/log/scratch-rebalance.log) to > figure out what the error messages are? >> 2) A _different_ disk seems to have been _partially_ drained. >> /dev/sda 2.8T 2.7T 12G 100% /sda >> /dev/sdb 2.8T 769G 2.0T 28% /sdb >> /dev/sdc 2.8T 2.1T 698G 75% /sdc >> /dev/sdd 2.8T 2.2T 589G 79% /sdd >> >> > I know this sounds silly, but just to be sure, is /dev/sda actually > mounted on "gfs-node01:sda"? > If yes,the files that _were_ successfully rebalanced should have been > moved from gfs-node01:sda to one of the other bricks. Is that the case? > >> When I mount the system it is read-only (another problem I want to fix > Again, the mount logs could shed some information .. > (btw a successful rebalance start/status sequence should be followed by > the rebalance 'commit' command to ensure the volume information gets > updated) > >> ASAP) so I'm pretty sure the failures aren't due to users changing the >> system underneath me. >> >> Thanks for any pointers. >> >> James Bellinger >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >