On Thu, Jun 28, 2007 at 02:39:44PM -0400, Lon Hohberger wrote: > > > *if all the nodes with SAN access are restarted (while the fifth node is > > up), the nodes with SAN access first stop the services locally - and > > then, apparently, ask the fifth node about the service status. Result: > > a line like the following, for each service: > > > > --cut-- > > Jun 28 17:56:20 pcn2.mappi.helsinki.fi clurgmgrd[5895]: <err> #34: Cannot get status for service service:im > > --cut-- > > What do you mean here, (sorry, being daft) > > Restart all nodes = "just rgmanager on all nodes", or "reboot all > nodes"? Reboot all nodes. > > *after that, the nodes with SAN access do nothing about any services > > until after the fifth node has left the cluster and has been fenced. > If you're rebooting the other 4 nodes, it sounds like the 5th is holding > some sort of a lock which it shouldn't be across quorum transitions > (which would be a bug). > > If this is the case, could you: > > * install rgmanager-debuginfo > * get me a backtrace: > > gdb clurgmgrd `pidof clurgmgrd` > thr a a bt I'll try to find the time for this tomorrow or something. (This behaviour doesn't really make the cluster un-production-useable, so I'm trying to solve the other problems first ;) --Janne -- Janne Peltonen <janne.peltonen@xxxxxxxxxxx> -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster