Greetings, I am trying to set up a cluster with (for now) two nodes, reason being the semantic guarantees of GFS when accessing shared files (that is, I am not interested in fault tolerance, performance or anything else). Unfortunately, I keep running into all sorts of problems, for example: - After a few hours of intensive workload, the cluster sometimes simply stops. All file system calls block, but things like cman_tool status or group_tool status insist everything is all right. Soft reboot is not possible due to various services waiting infinitely, after power cycling fsck finds inconsistencies on the file system. - Sometimes, when trying to execute a binary on the file system, I get execvp returning permission denied when it should not, but when I try again, everything is all right. I sometimes even observe this when trying to start a script on the file system, as if the interpreter of the script (which is on a different file system altogether) had wrong permissions. Again, simply trying one more time makes everything work. The config of the cluster seems relatively simple: - i686 single CPU node - file system device accessible over iSCSI - cluster subnet (unfortunately) connected over OpenVPN - x86_64 eight CPU virtual node - file system device provided by host which uses iSCSI - both nodes resolve into the same subnet using /etc/hosts - nothing except a single GFS2 file system is mounted - fencing uses fence_manual - both nodes run Fedora 8 Config attached, not like there is anything unusual in it. As an absolute novice, I am probably making some glaringly obvious silly mistake which results in the very weird behavior described above, but try as I might, I do not see anything that can cause this ? Thanks for any advice, Petr
<?xml version="1.0" ?> <cluster config_version="1" name="monoton"> <fence_daemon post_fail_delay="-1" post_join_delay="-1"/> <clusternodes> <clusternode name="delta.dsrb" nodeid="1" votes="1"> <fence> <method name="1"> <device name="Fencer" nodename="delta.dsrb"/> </method> </fence> </clusternode> <clusternode name="ichi.dsrb" nodeid="101" votes="1"> <fence> <method name="1"> <device name="Fencer" nodename="ichi.dsrb"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_manual" name="Fencer"/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster>
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster