Greetings, We are beginning to experiment with GlusterFS and are having some problems using it with OpenSolaris and ZFS. Our testing so far is limited to one OpenSolaris system running glusterfsd 2.0.0 as a brick with a raidz2 ZFS pool, and one 64-bit Ubuntu client using glusterfsclient 2.0.0 and the FUSE patch. Here's the config on our brick: NAME USED AVAIL REFER MOUNTPOINT data/cfs 23.5M 24.8T 54.7K /cfs glusterfs-server.vol --- volume brick type storage/posix option directory /cfs end-volume volume server type protocol/server subvolumes brick option transport-type tcp end-volume glusterfs-client.vol --- volume client type protocol/client option transport-type tcp option remote-host 192.168.0.5 option remote-subvolume brick end-volume The first problem is basic read/write. On the Ubuntu client, we start something like: # iozone -i 0 -r 1m -s 64g -f /cfs/64gbtest This will hang and die after awhile, usually within 10-15 minutes, with a "fsync: Transport endpoint is not connected" error. Smaller tests of 1g usually complete ok, but nothing above that. The second problem concerns accessing hidden ZFS snapshot directories. Let's say we take a snapshot on the brick, and start a simple write operation on the client like this: brick; # zfs snapshot data/cfs at test client: # touch file{1..10000} With this running, an 'ls /cfs/.zfs' causes immediate "Transport endpoint is not connected" or "Function not implemented" errors. The log on the brick shows errors like: 2009-05-06 22:48:55 W [posix.c:1351:posix_create] brick: open on /.zfs/file1: Operation not applicable 2009-05-06 22:48:55 E [posix.c:751:posix_mknod] brick: mknod on /.zfs/file1: Operation not applicable 2009-05-06 22:48:55 E [posix.c:751:posix_mknod] brick: mknod on /.zfs/file2: Operation not applicable 2009-05-06 22:48:55 E [posix.c:751:posix_mknod] brick: mknod on /.zfs/file3: Operation not applicable ... It seems that once something tries to access the .zfs directory, the client or brick thinks that /cfs is now /.zfs. Once this happens, unmounting and remounting the filesystem on the client doesn't fix it: # umount /cfs # glusterfs -s 192.168.0.5 /cfs # cd /cfs; touch test touch: cannot touch `test': Function not implemented 2009-05-12 15:16:49 W [posix.c:1351:posix_create] brick: open on /.zfs/test: Operation not applicable 2009-05-12 15:16:49 E [posix.c:751:posix_mknod] brick: mknod on /.zfs/test: Operation not applicable In one instance, performing this test caused a glusterfsd segfault: 2009-05-12 14:00:21 E [dict.c:2299:dict_unserialize] dict: undersized buffer passsed pending frames: frame : type(1) op(LOOKUP) patchset: 7b2e459db65edd302aa12476bc73b3b7a17b1410 signal received: 11 configuration details:backtrace 1 db.h 1 dlfcn 1 fdatasync 1 libpthread 1 spinlock 1 st_atim.tv_nsec 1 package-string: glusterfs 2.0.0 /lib/libc.so.1'__sighndlr+0xf [0xfed4cd5f] /lib/libc.so.1'call_user_handler+0x2af [0xfed400bf] /lib/libc.so.1'strlen+0x30 [0xfecc3ef0] /lib/libc.so.1'vfprintf+0xa7 [0xfed12b8f] /local/lib/libglusterfs.so.0.0.0'_gf_log+0x148 [0xfee11698] /local/lib/glusterfs/2.0.0/xlator/protocol/server.so.0.0.0'server_lookup+0x3c3 [0xfe3f25d3] /local/lib/glusterfs/2.0.0/xlator/protocol/server.so.0.0.0'protocol_server_interpret+0xc5 [0xfe3e6705] /local/lib/glusterfs/2.0.0/xlator/protocol/server.so.0.0.0'protocol_server_pollin+0x97 [0xfe3e69a7] /local/lib/glusterfs/2.0.0/xlator/protocol/server.so.0.0.0'notify+0x7f [0xfe3e6a2f] /local/lib/glusterfs/2.0.0/transport/socket.so.0.0.0'socket_event_poll_in+0x3b [0xfe28416b] /local/lib/glusterfs/2.0.0/transport/socket.so.0.0.0'socket_event_handler+0xa3 [0xfe284583] /local/lib/libglusterfs.so.0.0.0'0x26c41 [0xfee26c41] /local/lib/libglusterfs.so.0.0.0'event_dispatch+0x21 [0xfee26761] /local/sbin/glusterfsd'0x3883 [0x804b883] /local/sbin/glusterfsd'0x1f30 [0x8049f30] --------- The only way to recover from this is to restart glusterfsd. I'm guessing this is to be expected because the .zfs snapshot dir is a special case and gluster has no knowledge of it. The concern for us right now is that even with the .zfs directory hidden, someone can still accidentally try to access it and cause the filesystem to become unavailable. Finally, it appears that glusterfsd does asynchronous writes. Is it also possible to do synchronous writes? We are experimenting with SSDs and the ZFS intent log (ZIL) and would like to see if there is a difference in performance. Thanks. -phillip