On Wed, Nov 15, 2006 at 12:11:00PM -0500, Lon Hohberger wrote: > > Does this sound familiar to anyone? Has anyone encoutered anything like > > this in their experience? > > It doesn't sound familiar, but the easiest thing to do now is to first > try *without* bonding, then try again with it (both nodes in > active-backup mode). > > -- Lon Thanks for the advice Lon - unfortunately the system is in production so we'll have to wait for an outage window before we can try it. However we have tried to simulate the setup using VMWare, and with one node using load balancing for bonding, we can reproduce the error ("msg-open: connection timed out. Could not connect to service manager") by disabling one of the NICs in the bond and trying to relocate the service. When we do this, 50% of packets get through (i.e. load balancing is working and we can ping the other node), but the service fails to relocate with the above error. When we have both NICs enabled, 100% of packets get through, and service relocation works fine. So this seems to establish that network activity/problems can disrupt the relocation of services if one of the nodes is using load balancing on it's network bonding. Sound reasonable? We'll wait for an opportunity in the next few days to apply active-backup to the bonding, but if anyone has any other musings in the meantime it would be great to hear them of course. Thanks a lot! Karl -- Karl Podesta Systems Engineer, Securelinx Ltd. http://www.securelinx.com/ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster