I arranged for a cluster's shared partition (/db2, ext3, attached via
HBA to a SAN) to be disconnected while the cluster was busy. I was
expecting the cluster to fail over to the standby node. This did not
happen. From what I can tell the kernel of the active node reports the
file system error and then then remounts /db2 as read only. The fs.sh
script seems to check that the /db2 mount exists but, does not check its
state. Thus, fs.sh reports nothing wrong. Log entries:
Aug 25 11:38:59 caesar kernel: EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Aug 25 11:38:59 caesar kernel: Remounting filesystem read-only
Aug 25 11:39:00 caesar clurgmgrd: [5719]: <crit> Finding status of: /dev/sda1 /db2
Aug 25 11:39:00 caesar clurgmgrd: [5719]: <crit> in isMounted with /dev/sda1 and /db2
Aug 25 11:39:00 caesar clurgmgrd: [5719]: <crit> isMounted returned: 0
Aug 25 11:39:00 caesar clurgmgrd: [5719]: <crit> Past isMounted
Aug 25 11:39:00 caesar clurgmgrd: [5719]: <crit> running isAlive with: /db2
Aug 25 11:39:00 caesar clurgmgrd: [5719]: <crit> isAlive returnd: 0
Aug 25 11:39:20 caesar clurgmgrd: [5719]: <crit> Finding status of: /dev/sda1 /db2
Aug 25 11:39:20 caesar clurgmgrd: [5719]: <crit> in isMounted with /dev/sda1 and /db2
Aug 25 11:39:20 caesar clurgmgrd: [5719]: <crit> isMounted returned: 0
Aug 25 11:39:20 caesar clurgmgrd: [5719]: <crit> Past isMounted
Aug 25 11:39:20 caesar clurgmgrd: [5719]: <crit> running isAlive with: /db2
Aug 25 11:39:20 caesar clurgmgrd: [5719]: <crit> isAlive returnd: 1
Is this a bug or, have I missed something?
Configuration is 2 systems in active, standby mode. Both are AS4 x86_64
and are currently up to date with their patches.
--
Neil Watson | Gentoo Linux
System Administrator | Uptime 29 days
http://watson-wilson.ca | 2.6.16.19 AMD Athlon(tm) MP 2000+ x 2
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster