Hi all,
We have a three node GFS2 cluster on a CentOS 5.4 output of uname -a is "Linux IMSTermServer4.vpmthane.org 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux"
Its setup as a two node primary-primary drbd cluster drbd version: 8.3.7 (api:88/proto:86-91). Two LVM's are created on top of drbd as shown below in the drbd-overview output.
10:r0 Connected Primary/Primary UpToDate/UpToDate C r---- lvm-pv: Fac1 400.00G 400.00G
11:r1 Connected Primary/Primary UpToDate/UpToDate C r---- lvm-pv: Stu1 491.20G 491.20G
Inter connect between nodes is with a dedicated Gigabit switch.
Third node imports the above two file systems throu GNBD.
The setup was working fine for several months when one day we had a UPS failure. Ever since then we have frequent we have very frequent GFS2 errors and file system withdrawls, nodes restarting. The error in log is as shown below..
Mar 13 11:12:40 IMSTermServer5 kernel: block drbd10: Resync done (total 4 sec; paused 0 sec; 12288 K/sec)
Mar 13 11:12:40 IMSTermServer5 kernel: block drbd10: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: fatal: filesystem consistency error
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: RG = 26343101
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: function = gfs2_setbit, file = fs/gfs2/rgrp.c, line = 97
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: about to withdraw this file system
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: telling LM to withdraw
Mar 13 11:13:08 IMSTermServer5 kernel: block drbd11: Resync done (total 32 sec; paused 0 sec; 10112 K/sec)
Mar 13 11:13:08 IMSTermServer5 kernel: block drbd11: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
OR
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: fatal: invalid metadata block
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: bh = 26216898 (magic number)
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 334
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: about to withdraw this file system
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: telling LM to withdraw
The errors are much more frequent on the Stu volume. I dont want to lose any data.
I have tried running fsck.gfs2 on both servers on both Fac and Stu volumes ( un-mounted of course ) several times, I have tried updating all cluster and cluster storage related rpm's, I have updated to recent stable drbd 8.3.7 from drbd-8.3.7rc1 which was installed earlier ( and which worked fine for several months ) but the problems persist.
Any ideas how I can resolve this issues? With warm regards
Koustubha Kale
We have a three node GFS2 cluster on a CentOS 5.4 output of uname -a is "Linux IMSTermServer4.vpmthane.org 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux"
Its setup as a two node primary-primary drbd cluster drbd version: 8.3.7 (api:88/proto:86-91). Two LVM's are created on top of drbd as shown below in the drbd-overview output.
10:r0 Connected Primary/Primary UpToDate/UpToDate C r---- lvm-pv: Fac1 400.00G 400.00G
11:r1 Connected Primary/Primary UpToDate/UpToDate C r---- lvm-pv: Stu1 491.20G 491.20G
Inter connect between nodes is with a dedicated Gigabit switch.
Third node imports the above two file systems throu GNBD.
The setup was working fine for several months when one day we had a UPS failure. Ever since then we have frequent we have very frequent GFS2 errors and file system withdrawls, nodes restarting. The error in log is as shown below..
Mar 13 11:12:40 IMSTermServer5 kernel: block drbd10: Resync done (total 4 sec; paused 0 sec; 12288 K/sec)
Mar 13 11:12:40 IMSTermServer5 kernel: block drbd10: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: fatal: filesystem consistency error
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: RG = 26343101
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: function = gfs2_setbit, file = fs/gfs2/rgrp.c, line = 97
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: about to withdraw this file system
Mar 13 11:12:43 IMSTermServer5 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.0: telling LM to withdraw
Mar 13 11:13:08 IMSTermServer5 kernel: block drbd11: Resync done (total 32 sec; paused 0 sec; 10112 K/sec)
Mar 13 11:13:08 IMSTermServer5 kernel: block drbd11: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
OR
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: fatal: invalid metadata block
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: bh = 26216898 (magic number)
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 334
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: about to withdraw this file system
Mar 8 13:23:02 IMSTermServer4 kernel: GFS2: fsid=NEW_BRIMS:Gfs2Stu1.1: telling LM to withdraw
The errors are much more frequent on the Stu volume. I dont want to lose any data.
I have tried running fsck.gfs2 on both servers on both Fac and Stu volumes ( un-mounted of course ) several times, I have tried updating all cluster and cluster storage related rpm's, I have updated to recent stable drbd 8.3.7 from drbd-8.3.7rc1 which was installed earlier ( and which worked fine for several months ) but the problems persist.
Any ideas how I can resolve this issues?
Koustubha Kale
The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster