Re: Strange file corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A-1) shut down node #1 (the first that is about to be upgraded)
A-2) remove node #1 from the Proxmox cluster (pvevm delnode "metal1")
A-3) remove node #1 from the Gluster volume/cluster (gluster volume remove-brick ... && gluster peer detach "metal1")
A-4) install Debian Jessie on node #1, overwriting all data on the HDD - with same Network settings and hostname as before
A-5) install Proxmox 4.0 on node #1
A-6) install Gluster on node #1 and add it back to the Gluster volume (gluster volume add-brick ...) => shared storage will be complete again (spanning 3.4 and 4.0 nodes)
A-7) configure the Gluster volume as shared storage in Proxmox 4 (node #1)
A-8) configure the external Backup storage on node #1 (Proxmox 4)

Was the data on the gluster brick deleted as part of step 4? When you remove the brick, gluster will no longer track pending changes for that brick. If you add it back in with stale data but matching gfids, you would have two clean bricks with mismatching data. Did you have to use "add-brick...force"?


On 12/09/2015 06:53 AM, Udo Giacomozzi wrote:
Am 09.12.2015 um 14:39 schrieb Lindsay Mathieson:

Udo, it occurs to me that if your VM's were running on #2 & #3 and you live migrated them to #1 prior to rebooting #2/3, then you would indeed rapidly get progressive VM corruption.

However it wouldn't be due to the heal process, but rather the live migration with "performance.stat-prefetch" on. This always leads to qcow2 files becoming corrupted and unusable.

Nope. All VMs were running on #1, no exception.
Nodes #2 and #3 never had a VM running on them, so they were pratically idle since their installation.

Basically I set up node #1, including all VMs.
Then I've installed nodes #2 and #3, configured Proxmox and Gluster cluster and then waited quite some time until Gluster had synced up nodes #2 and #3 (healing).
From then on, I've rebooted nodes 2 & 3, but in theory these nodes never had to do any writes to the Gluster volume at all.

If you're interested, you can read about my upgrade strategy in this Proxmox forum post: http://forum.proxmox.com/threads/24990-Upgrade-3-4-HA-cluster-to-4-0-via-reinstallation-with-minimal-downtime?p=125040#post125040

Also, It seems rather strange to me that pratically all ~15 VMs  (!) suffered from data corruption. It's like if Gluster considered node #2 or #3 to be ahead and it "healed" in the wrong direction. I don't know..

BTW, once I understood what was going on, with the problematic "healing" still in progress, I was able to overwrite the bad images (still active on #1) by using standard Proxmox backup-restore and Gluster handled it correctly.


Anway, I really love the simplicity of Gluster (setting up and maintaining a cluster is extremely easy), but these healing issues are causing some headache to me... ;-)

Udo



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux