Re: Remove a brick, rebuild it, put it back in

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've simulated the problem on 4 VMs in a distributed replicated setup with a 2 replica-factor. I've repeatedly torn down and brought up a VM from a snapshot in each of my tests.

What has worked so far is this:

  1. Make a copy of /var/lib/glusterd from the affected machine, save it elsewhere.
  2. Configure your new machine (in my case I reverted to a VM snapshot). Assign the same ip and hostname!
  3. Install gluster.
  4. Stop the daemons if they are running.
  5. Nuke the /var/lib/glusterd directory and replace it with the saved copy in step 1.
  6. Create the brick directory.
  7. Get the extended volume attribute from a healthy node like so: getfattr -e base64 -n trusted.glusterfs.volume-id /data/brick_dir
  8. Apply the extended attribute volume id attribute like so: setfattr -n trusted.glusterfs.volume-id -v 'the_value_you_got_in_7==' /data/brick_dir
  9. Start the daemons.
  10. FUSE mount the gluster partition through the daemons running locally. So the /etc/fstab would contain something like: localhost:/gluster_volume /mnt/gluster  glusterfs _netdev,defaults  0 0
  11. On the healthy partner machine with another fuse mount point to the same volume do something like: find /mnt/fuse | xargs stat.
  12. Step 8 will make files appear under the mount point on the new box but the files are not going to be physically in the brick directory -- yet. See 10.
  13. Run the heal command from the same host where you ran find. That will finally sync the files to the brick. Run the heal info command periodically and the number of files being healed should eventually go down to 0. 
That's my experience with the VMs today.

On Wed, Oct 5, 2016 at 4:46 PM, Joe Julian <joe@xxxxxxxxxxxxxxxx> wrote:
What I always do is just shut it down, repair (or replace) the brick, then start it up again with "... start $volname force".

On October 5, 2016 11:27:36 PM GMT+02:00, Sergei Gerasenko <sgerasenko74@xxxxxxxxx> wrote:
Hi, sorry if this has been asked before but the documentation is a bit conflicting in various sources on what to do exactly.

I have an 6-node, distributed replicated cluster with a replica factor of 2. So it's 3 pairs of servers. I need to remove a server from one of those replica sets, rebuild it and put it back in.

What's the tried and proven sequence of steps for this? Any pointers would be very useful.

Thanks!
  Sergei



Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux