GlusterFS with OpenSolaris and ZFS

pds at cgb.indiana.edu (Phillip Steinbachs) · Tue, 12 May 2009 16:19:16 -0400 (EDT)

Greetings,

We are beginning to experiment with GlusterFS and are having some problems 
using it with OpenSolaris and ZFS.  Our testing so far is limited to one 
OpenSolaris system running glusterfsd 2.0.0 as a brick with a raidz2 ZFS 
pool, and one 64-bit Ubuntu client using glusterfsclient 2.0.0 and the 
FUSE patch.

Here's the config on our brick:

NAME                     USED  AVAIL  REFER  MOUNTPOINT
data/cfs                23.5M  24.8T  54.7K  /cfs

glusterfs-server.vol
---
volume brick
         type storage/posix
         option directory /cfs
end-volume

volume server
         type protocol/server
         subvolumes brick
         option transport-type tcp
end-volume

glusterfs-client.vol
---
volume client
         type protocol/client
         option transport-type tcp
         option remote-host 192.168.0.5
         option remote-subvolume brick
end-volume

The first problem is basic read/write.   On the Ubuntu client, we start 
something like:

# iozone -i 0 -r 1m -s 64g -f /cfs/64gbtest

This will hang and die after awhile, usually within 10-15 minutes, with a 
"fsync: Transport endpoint is not connected" error.  Smaller tests of 1g 
usually complete ok, but nothing above that.

The second problem concerns accessing hidden ZFS snapshot directories. 
Let's say we take a snapshot on the brick, and start a simple write 
operation on the client like this:

brick;

# zfs snapshot data/cfs at test

client:

# touch file{1..10000}

With this running, an 'ls /cfs/.zfs' causes immediate
"Transport endpoint is not connected" or "Function not implemented" 
errors.  The log on the brick shows errors like:

2009-05-06 22:48:55 W [posix.c:1351:posix_create] brick: open on 
/.zfs/file1: Operation not applicable
2009-05-06 22:48:55 E [posix.c:751:posix_mknod] brick: mknod on 
/.zfs/file1: Operation not applicable
2009-05-06 22:48:55 E [posix.c:751:posix_mknod] brick: mknod on 
/.zfs/file2: Operation not applicable
2009-05-06 22:48:55 E [posix.c:751:posix_mknod] brick: mknod on 
/.zfs/file3: Operation not applicable
...

It seems that once something tries to access the .zfs directory, the 
client or brick thinks that /cfs is now /.zfs.  Once this happens, 
unmounting and remounting the filesystem on the client doesn't fix it:

# umount /cfs
# glusterfs -s 192.168.0.5 /cfs
# cd /cfs; touch test
touch: cannot touch `test': Function not implemented

2009-05-12 15:16:49 W [posix.c:1351:posix_create] brick: open on 
/.zfs/test: Operation not applicable
2009-05-12 15:16:49 E [posix.c:751:posix_mknod] brick: mknod on 
/.zfs/test: Operation not applicable

In one instance, performing this test caused a glusterfsd segfault:

2009-05-12 14:00:21 E [dict.c:2299:dict_unserialize] dict: undersized 
buffer passsed
pending frames:
frame : type(1) op(LOOKUP)

patchset: 7b2e459db65edd302aa12476bc73b3b7a17b1410
signal received: 11
configuration details:backtrace 1
db.h 1
dlfcn 1
fdatasync 1
libpthread 1
spinlock 1
st_atim.tv_nsec 1
package-string: glusterfs 2.0.0
/lib/libc.so.1'__sighndlr+0xf [0xfed4cd5f]
/lib/libc.so.1'call_user_handler+0x2af [0xfed400bf]
/lib/libc.so.1'strlen+0x30 [0xfecc3ef0]
/lib/libc.so.1'vfprintf+0xa7 [0xfed12b8f]
/local/lib/libglusterfs.so.0.0.0'_gf_log+0x148 [0xfee11698]
/local/lib/glusterfs/2.0.0/xlator/protocol/server.so.0.0.0'server_lookup+0x3c3 
[0xfe3f25d3]
/local/lib/glusterfs/2.0.0/xlator/protocol/server.so.0.0.0'protocol_server_interpret+0xc5 
[0xfe3e6705]
/local/lib/glusterfs/2.0.0/xlator/protocol/server.so.0.0.0'protocol_server_pollin+0x97 
[0xfe3e69a7]
/local/lib/glusterfs/2.0.0/xlator/protocol/server.so.0.0.0'notify+0x7f 
[0xfe3e6a2f]
/local/lib/glusterfs/2.0.0/transport/socket.so.0.0.0'socket_event_poll_in+0x3b 
[0xfe28416b]
/local/lib/glusterfs/2.0.0/transport/socket.so.0.0.0'socket_event_handler+0xa3 
[0xfe284583]
/local/lib/libglusterfs.so.0.0.0'0x26c41 [0xfee26c41]
/local/lib/libglusterfs.so.0.0.0'event_dispatch+0x21 [0xfee26761]
/local/sbin/glusterfsd'0x3883 [0x804b883]
/local/sbin/glusterfsd'0x1f30 [0x8049f30]
---------

The only way to recover from this is to restart glusterfsd.  I'm guessing 
this is to be expected because the .zfs snapshot dir is a special case 
and gluster has no knowledge of it.  The concern for us right now is that 
even with the .zfs directory hidden, someone can still accidentally try to 
access it and cause the filesystem to become unavailable.

Finally, it appears that glusterfsd does asynchronous writes.  Is it also 
possible to do synchronous writes?  We are experimenting with SSDs and 
the ZFS intent log (ZIL) and would like to see if there is a difference 
in performance.

Thanks.

-phillip