On Thu, Jul 14, 2005 at 04:57:51PM -0400, Manuel Bujan wrote: > Is there any issue I should be aware of if SMP is enabled in > my kernel ? What if I compile my kernel to be pre-emptible ? Any problem with that and GFS ? > > I am running GFS in a dual Xeon server from DELL. > After a lot of time running my GFS setup I got the following error > in one of our cluster servers, and I had to reboot it in order to > restablish the service: > > ################################################################################# > Jul 14 14:19:35 atmail-2 kernel: 2 > Jul 14 14:19:35 atmail-2 kernel: gfs001 (18044) req reply einval ae2c0092 fr 1 r 1 2 > Jul 14 14:19:35 atmail-2 kernel: gfs001 (31381) req reply einval bf9901e7 fr 1 r 1 2 > Jul 14 14:19:35 atmail-2 kernel: gfs001 (2023) req reply einval d6c30333 fr 1 r 1 2 > Jul 14 14:19:35 atmail-2 kernel: gfs001 send einval to 1 > Jul 14 14:19:35 atmail-2 last message repeated 2 times I found similar log sniplets on a RHEL4U1 machine with dual Xeons (HP Proliant). The machine crashed with a kernel panic shortly after telling the other nodes to leave the cluster (sorry the staff was under pressure and noone wrote down the panic's output): Sep 30 05:08:11 zs01 kernel: nval to 1 (P:kernel) Sep 30 05:08:11 zs01 kernel: data send einval to 1 (P:kernel) Sep 30 05:08:11 zs01 kernel: Magma send einval to 1 (P:kernel) Sep 30 05:08:11 zs01 kernel: data send einval to 1 (P:kernel) Sep 30 05:08:11 zs01 kernel: Magma send einval to 1 (P:kernel) Sep 30 05:08:33 zs03 kernel: CMAN: removing node zs02 from the cluster : Missed too many heartbeats (P:kernel) Sep 30 05:08:39 zs03 kernel: CMAN: removing node zs01 from the cluster : No response to messages (P:kernel) Sep 30 05:08:45 zs03 kernel: CMAN: quorum lost, blocking activity (P:kernel) Seeking for the einval messages I found only this post here. So it doesn't seem to happen that often. OTOH it's the same hardware, perhaps dual Xeons are not good for GFS and/or the cluster infrastructure? In my case kernel and GFS bits are all from Red Hat, no self built components other than a qla2xxx driver, but the issue is on the cluster communication side. -- Axel.Thimm at ATrpms.net
Attachment:
pgprgUa0bBCvS.pgp
Description: PGP signature
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster