On Wed, 21 May 2008, Anton Khalikov wrote:
From my point of view AFR is something like RAID1. When one node goes
down it continues to work using only one node. When failed node comes
up, it does a resync from worked node to failed in background. Mounted
clients should not see this operation.
In fact it works different. When failed node comes up, it starts to
resync. From client's point it looks like file becomes 0 bytes length
and then start growing up to the length it had before. When gluster
serves a lot of small files all the resync process is almost invisible.
When one uses glusterfs to host big files (20-40 Gb), the resync over
1Gbit LAN takes a few minutes.
Yes, this has been raised before. There was a long thread here about
possible ways to work around it with journalling so that only the deltas
get transmitted for resync, or even a rsync type file sync for large
files.
Now imagine, we have a XEN based VPS farm. DomU's filesystems are
actually files with ext3 fs inside. These files are placed to glusterfs
storage with AFR between 2 servers.
Another thing that will likely be a stumbling block is that GlusterFS
doesn't currently support sparse files. It will resync the virtual
(as reported by ls) rather than actual size (which is typically much
smaller unless the virtual disk is full).
One dom0 node was rebooted for a
maintenance. All domUs were migrated from it to the second server before
rebooting, then migrated back. After a few minutes gluster started to
resync files in AFR and every domU system which tried to write something
to it's hdd found that hdd is actually missing and remounted fs in
read-only.
This may be to do with tthe posix locking. Currently posix lock server is
the first server in the AFR list (and the order of servers in AFR whould
be the same on all nodes, or else the locking won't work properly). When
the primary (lock) server goes away, all the locks disappear, too. There
was also another thread there discussing lock / metadata distribution
across the cluster with quorum locking. But that is also as yet
unimplemented.
Is it correct behavior for AFR ? Is there any way to force resync
process without affecting domUs ? May be I needed to run:
head /path/to/domu.img > /dev/null
from the rebooted server before migrating domUs back to it ? Whouldn't
such enforcing be visible on all mounted clients ?
If it's timeout related, then yes, head-ing the images will solve the
problem. If it's lock related, it won't make any difference.
I mean that when gfs
does resync, on all mount points file becomes 0 bytes length and then
start to growing back or it affects only the server where gfs was just
mounted ?
Yes, it only affects the servers that have an out-of-date version of the
file (or don't have the file at all). The currently online servers that
are up to date will be showing the correct information.
For the particular use case you are describing, you may find that
something like DRBD would fit better, with a separate DRBD device per
VM.
Gordan