Yes, this makes alot of sense. It's the behavior that I was experiencing that makes no sense. When one node was shut down, the whole VM cluster locked up. However, I managed to find that the culprit were the quorum settings. I put the quorum at 2 bricks for quorum now, and I am not experiencing the problem anymore. All my vm boot disks and data disks are now sharded. We are on 10gbit networks, when the node comes backs, we do not see any latency really.
Carl
On 2019-08-29 3:58 p.m., Darrell Budic
wrote:
You may be mis-understanding the way the gluster system works in detail here, but you’ve got the right idea overall. Since gluster is maintaining 3 copies of your data, you can lose a drive or a whole system and things will keep going without interruption (well, mostly, if a host node was using the system that just died, it may pause briefly before re-connecting to one that is still running via a backup-server setting or your dns configs). While the system is still going with one node down, that node is falling behind and new disk writes, and the remaining ones are keeping track of what’s changing. Once you repair/recover/reboot the down node, it will rejoin the cluster. Now the recovered system has to catch up, and it does this by having the other two nodes send it the changes. In the meantime, gluster is serving any reads for that data from one of the up to date nodes, even if you ask the one you just restarted. In order to do this healing, it had to lock the files to ensure no changes are made while it copies a chunk of them over the recovered node. When it locks them, your hypervisor notices they have gone read-only, and especially if it has a pending write for that file, may pause the VM because this looks like a storage issue to it. Once the file gets unlocked, it can be written again, and your hypervisor notices and will generally reactivate your VM. You may see delays too, especially if you only have 1G networking between your host nodes while everything is getting copied around. And your files could be being locked, updated, unlocked, locked again a few seconds or minutes later, etc. |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users