Re: Gluster 3.6.9 missing files during remove migration operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Ravi,

I tested the migration using the procedure detailed in [1] (a couple of times) on my single node test and did not see any of the problems I was seeing with the other technique. We'll proceed with rolling this out across our test environment next week and if that works out we'll move on to prod.

Thanks for your suggestion, I appreciate it.

Bernard.

On Apr 28, 2016, at 9:31 PM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

3.6.9 does not contain all fixes to trigger auto-heal when modifying the replica count using replace-brick/ add-brick commands.
For replace-brick, you might want to try out the manual steps mentioned in the "Replacing brick in Replicate/Distributed Replicate volumes" section of [1].
For add-brick, the steps mentioned by Anuradha in [2] should work.

HTH,
Ravi

[1] http://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
[2] https://www.gluster.org/pipermail/gluster-users/2016-January/025083.html


On 04/29/2016 01:51 AM, Bernard Gardner wrote:
Further to this, I've continued my testing and discovered that during the same type of migration operations (add-brick followed by remove-brick in a replica=2 config), I see shell wildcard expansion sometimes returning multiple instances of the same filename - so it seems that the namespace for a FUSE mounted filesystem during a brick deletion operation is somewhat mutable. This behaviour is intermittent, but occurs frequently enough that I'd say it's repeatable.

Does anyone have any feedback on my previous question of this being expected behavior or a bug?

Thanks,
Bernard.

On 20 April 2016 at 19:55, Bernard Gardner <bernard@xxxxxxxxxxx> wrote:
Hi,

I'm running gluster 3.6.9 on ubuntu 14.04 on a single test server (under Vagrant and VirtualBox), with 4 filesystems (in addition to the root), 2 of which are xfs directly on the disk, and the other 2 are xfs on an LVM config - the scenario I'm testing for is migration of our production gluster to add LVM so that we can use the snapshot features in 3.6 to implement offline backups.

On my test machine, I configured a volume with replica 2 and 2 bricks (with both bricks on the same server). I then started and mounted the volume back onto the same server under /mnt and populated /mnt with a 3 level deep hierarchy of 16 directories, and in each the leaf directories added 10 files of 1kB. So there are 40960 files in the filesystem (16x16x16x10) named like a/b/c/abc.0

For my first test, I did a "replace-brick commit force" to swap the first brick in my config with a new brick on one of the xfs on LVM filesystems. This resulted in the /mnt filesystem appearing empty until I manually started a full heal on the volume after which the files and directories started to re-appear on the mounted filesystem - after the heal completed, everything looked OK, but that's not going to work for our production systems. This appeared to be the suggestion from https://www.gluster.org/pipermail/gluster-users/2012-October/011502.html for a replicated volume

For my second attempt, I rebuilt the test system from scratch, built and mounted the gluster volume the same way and populated it with the same test file configuration. I then did a volume add-brick and added both of the xfs on LVM filesystems to the configuration. The directory tree was copied to the new bricks, but no files were moved. I then did volume remove-brick on the 2 initial bricks and the system started migrating the files to the new filesystems. This looked more promising, but during the migration operation, I ran find /mnt -type f | wc -l a number of times and on one of those checks, the number of files was 39280 instead of 40960 - I wasn't able to observe exactly which files were missing, I ran the command again immediately and it reported 40960 files every other time during the migration.

Is this expected behavior, or have I stumbled on a bug?

Is there a better workflow for completing this migration?

The production system runs in AWS and has 6 gluster servers over 2 availability zones, each of which has 1x600GB brick on an EBS volume, which are configured into a single 1.8TB volume with replication across the availability zones. We are planning on creating the new volumes with about 10% headroom left in the LVM config for holding snapshots, and hoping we can implement a backup solution by doing a gluster snapshot, followed by an EBS snapshot to get a consistent point in time offline backup (and then delete the gluster snapshot once the EBS snapshot has been taken). I haven't yet figured out the details of how we would restore from the snapshots (I can test that scenario once I have a working local test migration procedure and can migrate our test environment in AWS to support snapshots).

Thanks,
Bernard.




_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux