On 11/03/2016 2:24 AM, David Gossage wrote:
It is file based not block based healing so it saw multi-GB files
that it had to recopy over. It had to halt all write to those files
while that occurred or it would be a never ending cycle of re-copying
the large images. So the fact most VM's went haywire isnt that odd.
It does look based on timing in alerts the 2 bricks that were up kept
serving images until 3rd brick came back. It did heal all images just
fine.
What version are you running? 3.7.x has sharding (breaks large files
into chunks) to allow much finer grained healing, it speeds up heals a
*lot*. However it can't be applied retroactively, you have to enable
sharding then copy the VM over :(
http://blog.gluster.org/2015/12/introducing-shard-translator/
In regards to rolling reboots, it can be done with replicated storage
and gluster will transparently hand over client read/writes, but for
each VM image, only one copy at a time can be healing over wise access
will be blocked as you saw.
So recommended procedure:
- Enable sharding
- copy VM's over
- when rebooting wait for heals to complete before rebooting the next node
nb: Thoroughly recommend 3 way replication as you have done, it saves a
lot of headaches with quorums and split brain.
--
Lindsay Mathieson
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users