VMDK file replication issue

marco.trevisan at cardinis.com (Marco Trevisan) · Thu, 07 Aug 2008 19:19:08 +0200

Hi all,

I'm in the process of evaluating GlusterFS as a clustered file system, I 
like it very much because -among the other cool features- it's very easy 
to configure and it allows me to reuse the filesystems I already know as 
storage backends.

Before trying it on expensive hardware, I decided to try it on a very 
low HW configuration:

- 2 old PCs  (one P4 class CPU, IDE drives, one 100 Mbps ethernet card) 
and a 100 Mbps switch.

The OS is Debian 'lenny' in both nodes. 'Lenny' comes with FUSE v2.7.2.

I then compiled glusterfs 1.3.10 on both nodes and setup a server-side, 
single-process AFR configuration (the file content is reported below).
I did NOT use the glusterfs-patched FUSE library.

On top of that I've put some VMWare Server virtual machines. Each 
virtual machine image is split into a few 2 GB "vmdk" files (not 
preallocated).

I was successful in starting up and running my virtual machines (with 
only an additional line in their configuration files), so I was very 
happy with it.

The problem now is, after putting a virtual machine under "intense" 
I/O,  when I rebooted it today I found its root filesystem (=the vmdk) 
was corrupted. It lost some important directories (e.g. kernel modules 
directory under /lib/modules).

Just to give you a little more detail of the behaviour under I/O, when 
the virtual machine is doing I/O to the VMDK file, iptraf shows the 
corresponding traffic on the ethernet link at about 15-50 Mbps, so it 
looks like only the modified portions of the file are being sent to the 
other AFR node, infact if I simulate a failure by powering off the other 
AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to 
open the VMDK file, and that operation blocks until full synchronization 
has finished.

The glusterfs.log content is as follows:
[...]
2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning 
EAGAIN
2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: 
(path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) 
op_ret=-1 op_errno=11
2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning 
EAGAIN
2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: 
(path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) 
op_ret=-1 op_errno=11
2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks: 
Active locks found!

The above log does not seem to justify such file corruption... there is 
nothing related to "vmdk" files.

Any suggestions?
Is the HW configuration way too slow for afr to work reliably?
Are there mistakes in the configuration file?

Any help is really appreciated.

Kind regards,
   Marco

----------GlusterFS config file ----------------

# dataspace on storage1
volume gfs-ds
  type storage/posix
  option directory /mnt/hda7/gfs-ds
end-volume

# posix locks
volume gfs-ds-locks
  type features/posix-locks
  subvolumes gfs-ds
end-volume

volume gfs-ds-threads
  type performance/io-threads
  option thread-count 1
  option cache-size 32MB
  subvolumes gfs-ds-locks
end-volume

volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes gfs-ds-threads
  # storage network access only
  option auth.ip.gfs-ds-threads.allow *
  option auth.ip.gfs-ds-afr.allow *
end-volume

# dataspace on storage2
volume gfs-storage2-ds
  type protocol/client
  option transport-type tcp/client
  option remote-host <the other node's IP>  # storage network
  option remote-subvolume gfs-ds-threads
  option transport-timeout 10           # value in seconds; it should be 
set relatively low
end-volume

# automatic file replication translator for dataspace
volume gfs-ds-afr
  type cluster/afr
  subvolumes gfs-ds-locks gfs-storage2-ds         # local and remote 
dataspaces
end-volume

volume writebehind
  type performance/write-behind
  option aggregate-size 128kB
  subvolumes gfs-ds-afr
end-volume

volume readahead
  type performance/read-ahead
  option page-size 64kB
  option page-count 16
  subvolumes writebehind
end-volume
-------------