On Fri, 2007-06-08 at 16:20 +0100, Robert Clark wrote: > On Fri, 2007-06-08 at 13:04 +0100, Patrick Caulfield wrote: > > Robert Clark wrote: > > > Does anyone know what might cause ccsd to continue to refuse > > > connections for a lack of quorum after cman has decided the cluster is > > > quorate? > > > The usual cause of this is the magma plugins either not being installed in the > > right place or even at all. "magma_tool list" will show you which plugins are > > installed, for CMAN you need the magma_sm.so plugin. > > Thanks for the quick reply. I put "magma_tool list" into the script > just before and after trying to start fenced. The output both times is: > > Magma: Checking plugins in /lib/magma > > File Status Message > ---- ------ ------- > magma_gulm.so [OK] GuLM Plugin v1.0.5 > magma_sm.so [OK] CMAN/SM Plugin v1.1.7.4 > > Magma: 2 plugins available > > When I added "magma_tool quorum" as well, it reported "Connect > failure: No cluster running?". I've managed to get an strace of ccsd during the boot and it turned up some interesting lines, which I've interspersed with selected log entries: Jun 8 22:20:27 localhost ccsd[2981]: Starting ccsd 1.0.10: Jun 8 22:20:27 localhost kernel: CMAN 2.6.9-50.2 (built May 31 2007 15:39:24) installed Jun 8 22:20:27 localhost kernel: NET: Registered protocol family 30 Jun 8 22:20:27 localhost ccsd[2981]: Built: May 31 2007 15:48:09 Jun 8 22:20:27 localhost ccsd[2981]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Jun 8 22:20:27 localhost kernel: DLM 2.6.9-46.16 (built May 31 2007 15:45:51) installed Jun 8 22:20:28 localhost ccsd[2981]: cluster.conf (cluster name = defuturo_test, version = 2) found. Jun 8 22:20:28 localhost kernel: CMAN: Waiting to join or form a Linux-cluster 2990 22:20:28 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) Jun 8 22:20:29 localhost kernel: CMAN: sending membership request 2990 22:20:29 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) Jun 8 22:20:30 localhost kernel: CMAN: got node tamarillo Jun 8 22:20:30 localhost kernel: CMAN: got node guava Jun 8 22:20:30 localhost kernel: CMAN: quorum regained, resuming activity Jun 8 22:20:30 localhost cman: startup succeeded 2990 22:20:30 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:31 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:32 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:33 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:34 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:35 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:36 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:37 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:38 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:39 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:40 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:41 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory) 2990 22:20:42 stat64("/dev/dlm-control", {st_mode=S_IFCHR|0600, st_rdev=makedev(10, 62), ...}) = 0 Jun 8 22:20:42 kiwano ccsd[2981]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.7.4 Jun 8 22:20:42 kiwano ccsd[2981]: Initial status:: Quorate So, it looks like the problem is that the appearance of /dev/dlm-control is being delayed in the 4U5 cluster. Robert -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster