On Tue, 2006-11-14 at 16:44 +0100, Fabrizio Lippolis wrote: > The cluster is running MySQL, one of the machines runs the MySQL process > at a time while the database files are on the disk array. I checked that > if I kill the process, it will migrate on the second machine. From time > to time I experience occasional lockups of one of the two machines, it > doesn't happen very often and apparently without reason. The only > solution in this case is to brutally switch off the machine and reboot. > The problem started to be much more frequent when I tried to add another > service to the cluster, a LDAP directory. The crashes happened sometimes > more than once a day. :o The only problems I'm aware of related to cluster service counts are performance related (rgmanager used to slow down a lot with more services), and only on pre-U4 version. > I already wrote about this problem some time ago and somebody answered > that it could be caused because of the connection of the nodes to the > disk array. When a node is accessing the disk array the SCSI bus will > prevent the other node from doing something. Can anybody confirm this? That's very array dependent and I don't know much about how arrays work. Even so, I do not think it should cause a lockup; unless there's some kernel bug that it exposes. Do they crash (panic), or do they just become totally unresponsive? Have you tried getting a stack trace from the console using sysrq? (echo 1 > /proc/sys/kernel/sysrq; then hit alt-sysrq-t from the console). One thing that's peculiar is that - if they are locking up, they have to be locking up at about the same time -- otherwise, one would fence the other, and life would go on. -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster