Hi all, I'm in the process of evaluating GlusterFS as a clustered file system, I like it very much because -among the other cool features- it's very easy to configure and it allows me to reuse the filesystems I already know as storage backends. Before trying it on expensive hardware, I decided to try it on a very low HW configuration: - 2 old PCs (one P4 class CPU, IDE drives, one 100 Mbps ethernet card) and a 100 Mbps switch. The OS is Debian 'lenny' in both nodes. 'Lenny' comes with FUSE v2.7.2. I then compiled glusterfs 1.3.10 on both nodes and setup a server-side, single-process AFR configuration (the file content is reported below). I did NOT use the glusterfs-patched FUSE library. On top of that I've put some VMWare Server virtual machines. Each virtual machine image is split into a few 2 GB "vmdk" files (not preallocated). I was successful in starting up and running my virtual machines (with only an additional line in their configuration files), so I was very happy with it. The problem now is, after putting a virtual machine under "intense" I/O, when I rebooted it today I found its root filesystem (=the vmdk) was corrupted. It lost some important directories (e.g. kernel modules directory under /lib/modules). Just to give you a little more detail of the behaviour under I/O, when the virtual machine is doing I/O to the VMDK file, iptraf shows the corresponding traffic on the ethernet link at about 15-50 Mbps, so it looks like only the modified portions of the file are being sent to the other AFR node, infact if I simulate a failure by powering off the other AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to open the VMDK file, and that operation blocks until full synchronization has finished. The glusterfs.log content is as follows: [...] 2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning EAGAIN 2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) op_ret=-1 op_errno=11 2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning EAGAIN 2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) op_ret=-1 op_errno=11 2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks: Active locks found! The above log does not seem to justify such file corruption... there is nothing related to "vmdk" files. Any suggestions? Is the HW configuration way too slow for afr to work reliably? Are there mistakes in the configuration file? Any help is really appreciated. Kind regards, Marco ----------GlusterFS config file ---------------- # dataspace on storage1 volume gfs-ds type storage/posix option directory /mnt/hda7/gfs-ds end-volume # posix locks volume gfs-ds-locks type features/posix-locks subvolumes gfs-ds end-volume volume gfs-ds-threads type performance/io-threads option thread-count 1 option cache-size 32MB subvolumes gfs-ds-locks end-volume volume server type protocol/server option transport-type tcp/server subvolumes gfs-ds-threads # storage network access only option auth.ip.gfs-ds-threads.allow * option auth.ip.gfs-ds-afr.allow * end-volume # dataspace on storage2 volume gfs-storage2-ds type protocol/client option transport-type tcp/client option remote-host <the other node's IP> # storage network option remote-subvolume gfs-ds-threads option transport-timeout 10 # value in seconds; it should be set relatively low end-volume # automatic file replication translator for dataspace volume gfs-ds-afr type cluster/afr subvolumes gfs-ds-locks gfs-storage2-ds # local and remote dataspaces end-volume volume writebehind type performance/write-behind option aggregate-size 128kB subvolumes gfs-ds-afr end-volume volume readahead type performance/read-ahead option page-size 64kB option page-count 16 subvolumes writebehind end-volume -------------