A-1) shut
down node #1 (the first that is about to be upgraded)
A-2) remove
node #1 from the Proxmox cluster (pvevm delnode "metal1")
A-3) remove
node #1 from the Gluster volume/cluster (gluster volume
remove-brick ... && gluster peer detach "metal1")
A-4) install
Debian Jessie on node #1, overwriting all data on the HDD - with same Network
settings and hostname as before
A-5) install Proxmox 4.0 on
node #1
A-6) install
Gluster on node #1 and add it back to the Gluster volume (gluster volume add-brick
...) =>
shared storage will be complete again (spanning 3.4 and 4.0
nodes)
A-7)
configure the Gluster volume as shared storage in Proxmox 4
(node #1)
A-8)
configure the external Backup storage on node #1 (Proxmox 4)
Was the data on the gluster brick deleted as part of step 4? When
you remove the brick, gluster will no longer track pending changes
for that brick. If you add it back in with stale data but matching
gfids, you would have two clean bricks with mismatching data. Did
you have to use "add-brick...force"?
On 12/09/2015 06:53 AM, Udo Giacomozzi
wrote:
Am 09.12.2015 um 14:39 schrieb
Lindsay Mathieson:
Udo, it occurs to me that if your VM's were running on #2 &
#3 and you live migrated them to #1 prior to rebooting #2/3,
then you would indeed rapidly get progressive VM corruption.
However it wouldn't be due to the heal process, but rather the
live migration with "performance.stat-prefetch" on. This always
leads to qcow2 files becoming corrupted and unusable.
Nope. All VMs were running on #1, no exception.
Nodes #2 and #3 never had a VM running on them, so they were
pratically idle since their installation.
Basically I set up node #1, including all VMs.
Then I've installed nodes #2 and #3, configured Proxmox and
Gluster cluster and then waited quite some time until Gluster had
synced up nodes #2 and #3 (healing).
From then on, I've rebooted nodes 2 & 3, but in theory these
nodes never had to do any writes to the Gluster volume at all.
If you're interested, you can read about my upgrade strategy in
this Proxmox forum post:
http://forum.proxmox.com/threads/24990-Upgrade-3-4-HA-cluster-to-4-0-via-reinstallation-with-minimal-downtime?p=125040#post125040
Also, It seems rather strange to me that pratically all ~15 VMs
(!) suffered from data corruption. It's like if Gluster considered
node #2 or #3 to be ahead and it "healed" in the wrong direction.
I don't know..
BTW, once I understood what was going on, with the problematic
"healing" still in progress, I was able to overwrite the bad
images (still active on #1) by using standard Proxmox
backup-restore and Gluster handled it correctly.
Anway, I really love the simplicity of Gluster (setting up and
maintaining a cluster is extremely easy), but these healing issues
are causing some headache to me... ;-)
Udo
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
|
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users