On Mon, 2007-06-11 at 12:00 +0100, Patrick Caulfield wrote: > Robert Clark wrote: > > On Mon, 2007-06-11 at 11:05 +0100, Patrick Caulfield wrote: > >> Robert Clark wrote: > >>> Is the delay here likely to simply be udev being slow? > >> It sounds like udev isn't creating it at all. What happens is that libdlm waits > >> 10 seconds for udev to create the device file, and if it doesn't appear after > >> that time it will do the job itself. > > As an experiment, I've tried just loading the dlm module on a node > > with no cluster services running and confirmed that dlm-control is being > > created by udev. > > > > I must admit - I'm pretty confused now about the role of libdlm. Since > > it turns out that I've been running a 4U4 cluster without the dlm > > package installed (and so no libdlm) and, until this morning, my 4U5 > > cluster in the same state, I'm wondering: What uses libdlm? > I don't think anything is, that might be the problem. But magma (which ccsd uses > to talk to the cluster manager) checks for the existence of dlm-controld anyway! > just in case you need to create any locks using magma I suppose. OK, I may need to upgrade my condition from confused to baffled... I've slapped an strace on udevd during boot and here are some excerpts: Jun 11 16:23:48 localhost kernel: CMAN 2.6.9-50.2 (built May 31 2007 15:39:24) installed Jun 11 16:23:48 localhost kernel: NET: Registered protocol family 30 Jun 11 16:23:48 localhost kernel: DLM 2.6.9-46.16 (built May 31 2007 15:45:51) installed 970 16:23:52 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={9, 0}}, NULL) = 0 970 16:23:52 select(6, [3 5], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted) 970 16:24:01 --- SIGALRM (Alarm clock) @ 0 (0) --- 970 16:24:01 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xf6fc9708) = 3090 3090 16:24:01 execve("/sbin/udev", ["udev", "misc"], [/* 3 vars */]) = 0 3090 16:24:01 mknod("/dev/misc/dlm-control", S_IFCHR|0666, makedev(10, 62)) = 0 So, certainly udev is the main source of the delay (waiting for a SIGALRM?) but, then, the same version of udev is on both clusters. I guess I'll add something to the startup script to wait for /dev/misc/dlm-control to exist before starting fenced. Thanks, Robert -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster