Hi, I have a two-node cluster where each node exports filesystems to the other node, e.g. nodeA: 2tb array /dev/sdc LVM PV/VG/LV created /dev/nodea_sdc_vg/lvol0 mounted on /array/nodea /dev/sdc is exported via gnbd nodeb gnbd (/dev/sdc) device is imported /dev/nodeb_sdc_vg/lvol0 mounted on /array/nodeb nodeB: 2tb array /dev/sdc LVM PV/VG/LV created /dev/nodeb_sdc_vg/lvol0 mounted on /array/nodeb /dev/sdc is exported via gnbd nodea gnbd (/dev/sdc) device is imported /dev/nodea_sdc_vg/lvol0 mounted on /array/nodea Everything seemed to work fine when I set it up. I ran some bonnie++ tests with pretty vigorous settings, on each node against it's local GFS and on each node against the remote GFS, and the same simultaneously, everything worked fine. I've now put 200+gb of data on it and I'm encountering the problem where normal processes like find, du, or ls hang against nodeb's array while on nodea. Messages like the following appear in the dmesg on nodea (note that I have not used kill on any of these processes, so I'm not kill -9'ing them to get this): gnbd (pid 12082: du) got signal 9 gnbd0: Send control failed (result -4) gnbd0: Receive control failed (result -32) gnbd0: shutting down socket exitting GNBD_DO_IT ioctl resending requests gnbd (pid 12082: du) got signal 1 gnbd0: Send control failed (result -4) gnbd (pid 20598: find) got signal 9 gnbd0: Send control failed (result -4) gnbd (pid 4238: diff) got signal 9 gnbd0: Send control failed (result -4) gnbd0: Receive control failed (result -32) gnbd0: shutting down socket exitting GNBD_DO_IT ioctl resending requests Looking at the code with my limited knowledge of kernel programming, it looks like this means that a SIGKILL/SIGSEGV got trapped during the sock_sendmsg/sock_recvmsg? It's pretty easy to get this problem to manifest. I can clear the hang by doing gndb_export -O -R on the server (nodeb) and reexport. The client (nodea) automatically picks up the disconnect/reconnect and SIGKILL's the hung process. After this has happened a bunch of times, it looks like the GFS has got a little corrupted -- I ran a gfs_fsck -y -v on it and it cleaned up a bunch of fsck bitmap mismatches. It doesn't look like network connectivity is being lost at all between the two nodes, but I can't be absolutely sure a single packet didn't get dropped here or there. Any help would be greatly appreciated! -Ross Vital statistics of the systems (both are running identical kernel + GFS/GNBD/CMAN/etc modules, compiled on one and copied to the other) Linux nodea 2.6.12.6 #2 SMP Fri Apr 14 19:59:14 EDT 2006 i686 i686 i386 GNU/Linux cman-kernel-2.6.11.5-20050601.152643.FC4 dlm-kernel-2.6.11.5-20050601.152643.FC4 gfs-kernel-2.6.11.8-20050601.152643.FC4 gnbd-kernel-2.6.11.2-20050420.133124.FC4 Both boxes are dual xeons 2.8ghz with 4gb ram each (but with the BIOS memory mapping issue that prevents us from seeing all 4gb, so really 3.3gb). The arrays are SATA arrays on top of Areca cards -- one box has dual ARC-1120's and the other has a single ARC-1160 split up using LVM. -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster