> The error message indicates resource group (RG) may get corrupted. Have > you tried to do an fsck (or did it fixes anything) ? Should this be while the partition is unmapped on any of the nodes? # ./fsck /dev/mapper/VolGroup03-web fsck 1.35 (28-Feb-2004) e2fsck 1.35 (28-Feb-2004) Couldn't find ext2 superblock, trying backup blocks... fsck.ext2: Bad magic number in super-block while trying to open /dev/mapper/VolGroup03-web I've also seen this in the log; compdev kernel: GFS: Trying to join cluster "lock_dlm", "vgcomp:qm" compdev kernel: GFS: fsid=vgcomp:qm.1: Joined cluster. Now mounting FS... compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Trying to acquire journal lock... compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Looking at journal... compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Done compdev kernel: GFS: fsid=vgcomp:web.3: fatal: filesystem consistency error compdev kernel: GFS: fsid=vgcomp:web.3: RG = 31104599 compdev kernel: GFS: fsid=vgcomp:web.3: function = gfs_setbit compdev kernel: GFS: fsid=vgcomp:web.3: file = /home/xos/gen/updates-2007-11/xlrpm29472/rpm/BUILD/gfs- kernel-2.6.9-72/up/src/gfs/bits.c, line = 71 compdev kernel: GFS: fsid=vgcomp:web.3: time = 1196105648 compdev kernel: GFS: fsid=vgcomp:web.3: about to withdraw from the cluster compdev kernel: GFS: fsid=vgcomp:web.3: waiting for outstanding I/O compdev kernel: GFS: fsid=vgcomp:web.3: telling LM to withdraw compdev kernel: lock_dlm: withdraw abandoned memory compdev kernel: GFS: fsid=vgcomp:web.3: withdrawn and; compdev kernel: GFS: fsid=vgcomp:web.3: Scanning for log elements... compdev kernel: GFS: fsid=vgcomp:web.3: Found 1 unlinked inodes compdev kernel: GFS: fsid=vgcomp:web.3: Found quota changes for 0 IDs compdev kernel: GFS: fsid=vgcomp:web.3: Done compdev kernel: GFS: fsid=vgcomp:web.3: fatal: filesystem consistency error compdev kernel: GFS: fsid=vgcomp:web.3: RG = 31104599 compdev kernel: GFS: fsid=vgcomp:web.3: function = gfs_setbit compdev kernel: GFS: fsid=vgcomp:web.3: file = /home/xos/gen/updates-2007-11/xlrpm29472/rpm/BUILD/gfs- kernel-2.6.9-72/up/src/gfs/bits.c, line = 71 > Also do you remember any abnormal event (unclean shut-down, panic, > power-lost, etc) *before* this issue pops out ? Yes, I posted a few things about that recently. The cluster was dying in kernel panic until I updated all of them to be identical again. Since then, this node has been having these problems. I have also noticed that cman never shuts down correctly when I reboot nodes and that there is a lot of garbage (for lack of better word) about volume group information which no longer exists when I reboot nodes. Last but not least, I wasn't sure what to post here so I decided I better post more than not enough. compdev rc.sysinit: Checking root filesystem succeeded compdev kernel: IP route cache hash table entries: 32768 (order: 5, 131072 bytes) compdev rc.sysinit: Remounting root filesystem in read-write mode: succeeded compdev kernel: TCP established hash table entries: 131072 (order: 8, 1048576 bytes) compdev lvm.static: compdev kernel: TCP bind hash table entries: 131072 (order: 9, 3670016 bytes) compdev lvm.static: connect() failed on local socket: Connection refused compdev kernel: TCP: Hash tables configured (established 131072 bind 131072) compdev lvm.static: WARNING: Falling back to local file-based locking. compdev kernel: Initializing IPsec netlink socket compdev lvm.static: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: NET: Registered protocol family 1 compdev lvm.static: 1 logical volume(s) in volume group VolGroup03 now active compdev kernel: NET: Registered protocol family 17 compdev lvm.static: 1 logical volume(s) in volume group VolGroup02 now active compdev kernel: Freeing unused kernel memory: 168k freed compdev lvm.static: 1 logical volume(s) in volume group VolGroup01 now active compdev kernel: SCSI subsystem initialized compdev rc.sysinit: Setting up Logical Volume Management: succeeded compdev kernel: QLogic Fibre Channel HBA Driver compdev rc.sysinit: Checking filesystems succeeded compdev kernel: qla2200 0000:00:11.0: Found an ISP2200, irq 11, iobase 0xe0816000 compdev rc.sysinit: Mounting local filesystems: succeeded compdev kernel: qla2200 0000:00:11.0: Configuring PCI space... compdev rc.sysinit: Enabling local filesystem quotas: succeeded compdev kernel: qla2200 0000:00:11.0: Configure NVRAM parameters... compdev rc.sysinit: Enabling swap space: succeeded compdev kernel: qla2200 0000:00:11.0: Verifying loaded RISC code... compdev init: Entering runlevel: 3 compdev kernel: qla2200 0000:00:11.0: LIP reset occured (0). compdev microcode_ctl: microcode_ctl startup succeeded compdev kernel: qla2200 0000:00:11.0: Waiting for LIP to complete... compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: qla2200 0000:00:11.0: LOOP UP detected (1 Gbps). compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: qla2200 0000:00:11.0: Topology - (F_Port), Host Loop address 0xffff compdev vgchange: compdev kernel: scsi0 : qla2xxx compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: qla2200 0000:00:11.0: compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: QLogic Fibre Channel HBA Driver: 8.01.04-d8 compdev vgchange: Volume group "WARNING:" not found compdev kernel: QLogic QLA22xx - compdev lvm2-monitor: Starting monitoring for VG WARNING:: failed compdev kernel: ISP2200: PCI (33 MHz) @ 0000:00:11.0 hdma-, host#=0, fw=2.02.08 TP compdev vgchange: compdev kernel: Vendor: MYLEX Model: DACARMRB Rev: 7775 compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: Type: Direct-Access ANSI SCSI revision: 02 compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: qla2200 0000:00:11.0: scsi(0:0:0:0): Enabled tagged queuing, queue depth 16. compdev vgchange: compdev kernel: SCSI device sda: 1013760000 512-byte hdwr sectors (519045 MB) compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: SCSI device sda: drive cache: write back compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: SCSI device sda: 1013760000 512-byte hdwr sectors (519045 MB) compdev vgchange: Volume group "Falling" not found compdev kernel: SCSI device sda: drive cache: write back compdev lvm2-monitor: Starting monitoring for VG Falling: failed compdev kernel: sda: compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: Vendor: MYLEX Model: DACARMRB Rev: 7775 compdev vgchange: compdev kernel: Type: Direct-Access ANSI SCSI revision: 02 compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: qla2200 0000:00:11.0: scsi(0:0:0:1): Enabled tagged queuing, queue depth 16. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: SCSI device sdb: 1013760000 512-byte hdwr sectors (519045 MB) compdev vgchange: Volume group "back" not found compdev kernel: SCSI device sdb: drive cache: write back compdev lvm2-monitor: Starting monitoring for VG back: failed compdev kernel: SCSI device sdb: 1013760000 512-byte hdwr sectors (519045 MB) compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: SCSI device sdb: drive cache: write back compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: sdb: compdev vgchange: compdev kernel: Attached scsi disk sdb at scsi0, channel 0, id 0, lun 1 compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: Vendor: MYLEX Model: DACARMRB Rev: 7775 compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: Type: Direct-Access ANSI SCSI revision: 02 compdev vgchange: Volume group "to" not found compdev kernel: qla2200 0000:00:11.0: scsi(0:0:0:2): Enabled tagged queuing, queue depth 16. compdev lvm2-monitor: Starting monitoring for VG to: failed compdev kernel: SCSI device sdc: 997449728 512-byte hdwr sectors (510694 MB) compdev vgchange: compdev kernel: SCSI device sdc: drive cache: write back compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: SCSI device sdc: 997449728 512-byte hdwr sectors (510694 MB) compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: SCSI device sdc: drive cache: write back compdev vgchange: compdev kernel: sdc: compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 2 compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: Volume group "local" not found compdev kernel: device-mapper: 4.5.5-ioctl (2006-12-01) initialised: dm-devel@xxxxxxxxxx compdev lvm2-monitor: Starting monitoring for VG local: failed compdev kernel: kjournald starting. Commit interval 5 seconds compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: EXT3-fs: mounted filesystem with ordered data mode. compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: SELinux: Disabled at runtime. compdev vgchange: compdev kernel: SELinux: Unregistering netfilter hooks compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: inserting floppy driver for 2.6.9-55.0.12.EL.XOS.1 compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: Floppy drive(s): fd0 is 1.44M compdev vgchange: Volume group "file-based" not found compdev kernel: FDC 0 is a post-1991 82077 compdev lvm2-monitor: Starting monitoring for VG file-based: failed compdev kernel: e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: e100: Copyright(c) 1999-2005 Intel Corporation compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: compdev kernel: e100: eth0: e100_probe: addr 0xfebfe000, irq 5, MAC addr 00:20:94:10:43:67 compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: e100: eth1: e100_probe: addr 0xfebfd000, irq 11, MAC addr 00:20:94:10:43:68 compdev vgchange: Volume group "locking." not found compdev kernel: USB Universal Host Controller Interface driver v2.2 compdev lvm2-monitor: Starting monitoring for VG locking.: failed compdev kernel: PCI: Enabling device 0000:00:07.2 (0000 -> 0001) compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: PCI: No IRQ known for interrupt pin D of device 0000:00:07.2. Please try using pci=biosi rq. compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: uhci_hcd 0000:00:07.2: Found HC with no IRQ. Check BIOS/PCI 0000:00:07.2 setup! compdev vgchange: compdev kernel: md: Autodetecting RAID arrays. compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: md: autorun ... compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: md: ... autorun DONE. compdev vgchange: Volume group "Volume" not found compdev kernel: EXT3 FS on hda1, internal journal compdev lvm2-monitor: Starting monitoring for VG Volume: failed compdev kernel: Adding 787176k swap on /dev/hda2. Priority:-1 extents:1 compdev vgchange: compdev kernel: IA-32 Microcode Update Driver: v1.14 <tigran@xxxxxxxxxxx> compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: microcode: CPU0 updated from revision 0x7 to 0x8, date = 05052000 compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: IA-32 Microcode Update Driver v1.14 unregistered compdev vgchange: compdev kernel: ip_tables: (C) 2000-2002 Netfilter core team compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: ip_tables: (C) 2000-2002 Netfilter core team compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex compdev vgchange: Volume group "Groups" not found compdev kernel: NET: Registered protocol family 10 compdev lvm2-monitor: Starting monitoring for VG Groups: failed compdev kernel: Disabled Privacy Extensions on device c0386e60(lo) compdev vgchange: connect() failed on local socket: Connection refused compdev kernel: IPv6 over IPv4 tunneling driver compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: compdev kernel: CMAN 2.6.9-50.2.0.6.XOS.1 (built Nov 15 2007 12:03:01) installed compdev vgchange: WARNING: Falling back to local file-based locking. compdev kernel: NET: Registered protocol family 30 compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev kernel: DLM 2.6.9-46.16.0.12.XOS.1 (built Nov 15 2007 12:27:30) installed compdev vgchange: Volume group "with" not found compdev lvm2-monitor: Starting monitoring for VG with: failed compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: Volume group "the" not found compdev lvm2-monitor: Starting monitoring for VG the: failed compdev vgchange: compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: Volume group "clustered" not found compdev lvm2-monitor: Starting monitoring for VG clustered: failed compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: Volume group "attribute" not found compdev lvm2-monitor: Starting monitoring for VG attribute: failed compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: Volume group "will" not found compdev lvm2-monitor: Starting monitoring for VG will: failed compdev vgchange: compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: Volume group "be" not found compdev lvm2-monitor: Starting monitoring for VG be: failed compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: Volume group "inaccessible." not found compdev lvm2-monitor: Starting monitoring for VG inaccessible.: failed compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: 1 logical volume(s) in volume group "VolGroup01" monitored compdev lvm2-monitor: Starting monitoring for VG VolGroup01: succeeded compdev vgchange: compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: 1 logical volume(s) in volume group "VolGroup02" monitored compdev lvm2-monitor: Starting monitoring for VG VolGroup02: succeeded compdev vgchange: compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: connect() failed on local socket: Connection refused compdev vgchange: WARNING: Falling back to local file-based locking. compdev vgchange: Volume Groups with the clustered attribute will be inaccessible. compdev vgchange: 1 logical volume(s) in volume group "VolGroup03" monitored compdev lvm2-monitor: Starting monitoring for VG VolGroup03: succeeded compdev kudzu: succeeded compdev sysctl: net.ipv4.ip_forward = 0 compdev sysctl: net.ipv4.conf.default.rp_filter = 1 compdev sysctl: net.ipv4.conf.default.accept_source_route = 0 compdev sysctl: kernel.sysrq = 0 compdev sysctl: kernel.core_uses_pid = 1 compdev sysctl: kernel.panic_on_oops = 1 compdev network: Setting network parameters: succeeded compdev network: Bringing up loopback interface: succeeded compdev network: Bringing up interface eth0: succeeded compdev ccsd[2458]: Remote copy of cluster.conf is from quorate node. compdev ccsd[2458]: Local version # : 80 compdev ccsd[2458]: Remote version #: 80 compdev kernel: CMAN: Waiting to join or form a Linux-cluster compdev kernel: CMAN: sending membership request compdev ccsd[2458]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.7.4 compdev ccsd[2458]: Initial status:: Inquorate compdev kernel: CMAN: got node cweb93 compdev kernel: CMAN: got node cweb94 compdev kernel: CMAN: got node cweb92 compdev kernel: CMAN: got node img62 compdev ccsd[2458]: Cluster is quorate. Allowing connections. compdev kernel: CMAN: quorum regained, resuming activity compdev cman: startup succeeded compdev fenced: startup succeeded compdev clvmd: Cluster LVM daemon started - connected to CMAN compdev clvmd: clvmd startup succeeded compdev vgchange: 1 logical volume(s) in volume group "VolGroup03" now active compdev vgchange: 1 logical volume(s) in volume group "VolGroup02" now active compdev vgchange: 1 logical volume(s) in volume group "VolGroup01" now active compdev clvmd: Activating VGs: succeeded compdev netfs: Mounting other filesystems: succeeded compdev kernel: Lock_Harness 2.6.9-72.2.0.9.XOS.1 (built Nov 15 2007 12:30:46) installed compdev kernel: GFS 2.6.9-72.2.0.9.XOS.1 (built Nov 15 2007 12:31:07) installed compdev kernel: GFS: Trying to join cluster "lock_dlm", "vgcomp:qm" compdev kernel: Lock_DLM (built Nov 15 2007 12:30:48) installed compdev kernel: GFS: fsid=vgcomp:qm.1: Joined cluster. Now mounting FS... compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Trying to acquire journal lock... compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Looking at journal... compdev kernel: GFS: fsid=vgcomp:qm.1: jid=1: Done compdev kernel: GFS: Trying to join cluster "lock_dlm", "vgcomp:web" compdev kernel: GFS: fsid=vgcomp:web.3: Joined cluster. Now mounting FS... compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Trying to acquire journal lock... compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Looking at journal... compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Done compdev kernel: GFS: fsid=vgcomp:web.3: Scanning for log elements... compdev kernel: GFS: fsid=vgcomp:web.3: Found 1 unlinked inodes compdev kernel: GFS: fsid=vgcomp:web.3: Found quota changes for 0 IDs compdev kernel: GFS: fsid=vgcomp:web.3: Done compdev gfs: Mounting GFS filesystems: succeeded -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster