Re: Migrating data from a failing filesystem

Ravishankar N <ravishankar@xxxxxxxxxx> · Wed, 24 Sep 2014 21:40:19 +0530

On 09/24/2014 07:35 PM, james.bellinger@xxxxxxxxxxxxxxxx wrote:
Thanks for the info!
I started the remove-brick start and, of course, the brick went read-only
in less than an hour.
This morning I checked the status a couple of minutes apart and found:

      Node Rebalanced-files       size     scanned      failures
status
---------      -----------   --------   ---------   -----------
------------
gfs-node04             6634   590.7GB       81799         14868    in
progress
...
gfs-node04             6669   596.5GB       86584         15271    in
progress

I'm not sure exactly what it is doing here:  4785 files scanned, 403
failures, and 35 rebalanced.
What it is supposed to be doing is to scan all the files in the volume, 
and for the files present in itself, i.e.gfs-node04:/sdb, migrate 
(rebalance) it into other bricks in the volume. Let it go to completion. 
The rebalance log should give you an idea of the 403 failures.
  The used amount on the partition hasn't
changed.
Probably because after copying the files to the other bricks, the 
unlinks/rmdirs on itself are failing because of the FS being mounted 
read-only.
If anything, the _other_ brick on the server is shrinking!
Because the data is being copied into this brick as a part of migration?
(Which is related to the question I had before that you mention below.)

gluster volume remove-brick scratch gfs-node04:/sdb start
What is your original volume configuration? (gluster vol info scratch)?
but...
df /sda
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda             12644872688 10672989432 1844930140  86% /sda
...
/dev/sda             12644872688 10671453672 1846465900  86% /sda

Have I shot myself in the other foot?
jim

On 09/23/2014 08:56 PM, james.bellinger@xxxxxxxxxxxxxxxx wrote:
I inherited a non-replicated gluster system based on antique hardware.

One of the brick filesystems is flaking out, and remounts read-only.  I
repair it and remount it, but this is only postponing the inevitable.

How can I migrate files off a failing brick that intermittently turns
read-only?  I have enough space, thanks to a catastrophic failure on
another brick; I don't want to present people with another one.  But if
I
understand migration correctly references have to be deleted, which
isn't
possible if the filesystem turns read-only.
What you could do is initiate the  migration  with `remove-brick start'
and monitor the progress with 'remove-brick status`. Irrespective of
whether the rebalance  completes or fails (due to the brick turning
read-only), you could anyway update the volume configuration with
'remove-brick commit`. Now if the brick still has files left, mount the
gluster volume on that node and copy the files from the brick to the
volume via the mount.  You can then safely rebuild the array/ add a
different brick or whatever.

What I want to do is migrate the files off, remove it from gluster,
rebuild the array, rebuild the filesystem, and then add it back as a
brick.  (Actually what I'd really like is to hear that the students are
all done with the system and I can turn the whole thing off, but theses
aren't complete yet.)

Any advice or words of warning will be appreciated.
Looks like your bricks are in trouble for over a year now
(http://gluster.org/pipermail/gluster-users.old/2013-September/014319.html).
Better get them fixed sooner than later! :-)
Oddly enough the old XRAID systems are holding up better than the VTRAK
arrays.  That doesn't help me much, though, since they're so small.

HTH,
Ravi

James Bellinger

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users