The system was running well for a while but lately we had a flaky disk in the RAID array which we replaced with a healthy one but suddenly the CLVM/GFS became unusable, we can mount GFS but while listing it recursively 'ls -R' it hangs with Input/output error, can't even access the c/LVM LUN rawly using 'dd' BUT we still can access the LVM PV devices using 'dd'. Reconfiguring the LVM volume as a local one and accessing it exclusively from one node doesn't make a difference. RHEL5: 2.6.18-164.11.1.el5 # modinfo gfs filename: /lib/modules/2.6.18-164.11.1.el5/weak-updates/gfs/gfs.ko license: GPL author: Red Hat, Inc. description: Global File System 0.1.34-2.el5 srcversion: 3B1BAC4069F1A4B556A958A depends: dlm vermagic: 2.6.18-159.el5 SMP mod_unload gcc-4.1 # uname -r 2.6.18-164.11.1.el5 # modinfo /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/block/aoe/aoe.ko filename: /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/block/aoe/aoe.ko description: AoE block/char driver for 2.6.2 and newer 2.6 kernels author: Sam Hopkins <sah@xxxxxxxxxx> license: GPL srcversion: 42BF122979AC807F2BB50E6 depends: vermagic: 2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1 parm: aoe_iflist:aoe_iflist=dev1[,dev2...] (string) parm: version:aoe module version 74 (string) parm: aoe_dyndevs:Use dynamic minor numbers for devices. (int) parm: aoe_deadsecs:After aoe_deadsecs seconds, give up and fail dev. (int) parm: aoe_maxout:Only aoe_maxout outstanding packets for every MAC on eX.Y. (int) parm: aoe_maxsectors:When nonzero, set the maximum number of sectors per I/O request in new devices. (int) # modinfo dlm filename: /lib/modules/2.6.18-164.11.1.el5/kernel/fs/dlm/dlm.ko license: GPL author: Red Hat, Inc. description: Distributed Lock Manager srcversion: E768995007648CA8DB078AE depends: configfs vermagic: 2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1 module_sig: 883f3504b56fe19c59c69348c13cf1f1126a509f6ddaee3965ee8b5fcd04163669647a889a9801e09f722187d1de068c0d52cd2b99bc3d475cb6ca1a0 Herein what the kernel spits out: Jul 6 11:27:36 kiwiland kernel: GFS 0.1.34-2.el5 (built Sep 9 2009 06:54:42) installed Jul 6 11:27:36 kiwiland kernel: Lock_DLM (built Sep 9 2009 06:54:38) installed Jul 6 11:27:36 kiwiland kernel: Lock_Nolock (built Sep 9 2009 06:54:37) installed Jul 6 11:27:36 kiwiland kernel: Trying to join cluster "lock_dlm", "FSC:files" Jul 6 11:27:36 kiwiland kernel: Joined cluster. Now mounting FS... Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Trying to acquire journal lock... Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Looking at journal... Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Acquiring the transaction lock... Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Replaying journal... Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Replayed 0 of 11 blocks Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: replays = 0, skips = 4, sames = 7 Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Journal replayed in 1s Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Done Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Trying to acquire journal lock... Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Looking at journal... Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Done Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Scanning for log elements... Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Found 2 unlinked inodes Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Found quota changes for 2 IDs Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Done Jul 6 11:27:36 kiwiland kernel: Trying to join cluster "lock_dlm", "FSC:webcluster" Jul 6 11:27:36 kiwiland kernel: Joined cluster. Now mounting FS... Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Trying to acquire journal lock... Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Looking at journal... Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Done Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Scanning for log elements... Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Found 0 unlinked inodes Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Found quota changes for 0 IDs Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Done Jul 6 11:27:37 kiwiland kernel: Installing knfsd (copyright (C) 1996 okir@xxxxxxxxxxxx). Jul 6 11:27:39 kiwiland kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Jul 6 11:27:39 kiwiland kernel: NFSD: starting 90-second grace period Jul 6 11:32:21 kiwiland kernel: dlm: closing connection to node 1 Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Trying to acquire journal lock... Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: fatal: invalid metadata block Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: bh = 1432543247 (magic) Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: function = gfs_rgrp_read Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: file = /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/rgrp.c, line = 830 Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: time = 1278372781 Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: about to withdraw from the cluster Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: telling LM to withdraw Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Looking at journal... Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Acquiring the transaction lock... Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Replaying journal... Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Replayed 0 of 0 blocks Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: replays = 0, skips = 0, sames = 0 Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Journal replayed in 1s Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Done Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:files.0: withdrawn Jul 6 11:33:02 kiwiland kernel: Jul 6 11:33:02 kiwiland kernel: Call Trace: Jul 6 11:33:02 kiwiland kernel: [<ffffffff88805018>] :gfs:gfs_lm_withdraw+0xc4/0xd3 Jul 6 11:33:02 kiwiland kernel: [<ffffffff80063a36>] __wait_on_bit+0x60/0x6e Jul 6 11:33:02 kiwiland kernel: [<ffffffff8001538b>] sync_buffer+0x0/0x3f Jul 6 11:33:02 kiwiland kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 Jul 6 11:33:02 kiwiland kernel: [<ffffffff800a00e5>] wake_bit_function+0x0/0x23 Jul 6 11:33:02 kiwiland kernel: [<ffffffff8881cc97>] :gfs:gfs_meta_check_ii+0x32/0x3e Jul 6 11:33:02 kiwiland kernel: [<ffffffff88819439>] :gfs:gfs_rgrp_read+0x139/0x225 Jul 6 11:33:02 kiwiland kernel: [<ffffffff887fb8e8>] :gfs:glock_wait_internal+0x229/0x2c3 Jul 6 11:33:02 kiwiland kernel: [<ffffffff887fbd17>] :gfs:gfs_glock_nq+0x395/0x3d6 Jul 6 11:33:02 kiwiland kernel: [<ffffffff887fbd6e>] :gfs:gfs_glock_nq_init+0x16/0x2a Jul 6 11:33:02 kiwiland kernel: [<ffffffff88817466>] :gfs:gfs_rgrp_lvb_init+0x1e/0x3f Jul 6 11:33:02 kiwiland kernel: [<ffffffff8881a46f>] :gfs:gfs_stat_gfs+0x213/0x273 Jul 6 11:33:02 kiwiland kernel: [<ffffffff8881353d>] :gfs:gfs_statfs+0x67/0xea Jul 6 11:33:02 kiwiland kernel: [<ffffffff800deba3>] vfs_statfs+0x63/0x7f Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886d2ce>] :nfsd:nfsd_statfs+0x28/0x38 Jul 6 11:33:02 kiwiland kernel: [<ffffffff888745f8>] :nfsd:nfsd3_proc_fsstat+0x3f/0x54 Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a1db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 Jul 6 11:33:02 kiwiland kernel: [<ffffffff886e0529>] :sunrpc:svc_process+0x454/0x71b Jul 6 11:33:02 kiwiland kernel: [<ffffffff80064644>] __down_read+0x12/0x92 Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a746>] :nfsd:nfsd+0x1a5/0x2cb Jul 6 11:33:02 kiwiland kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb Jul 6 11:33:02 kiwiland kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jul 6 11:33:02 kiwiland kernel: Another kernel spit out: Jul 5 02:01:19 Hercules kernel: GFS: fsid=FSC:files.0: fast statfs start time = 1278252079 Jul 5 03:01:16 Hercules kernel: GFS: fsid=FSC:files.0: fast statfs start time = 1278255676 Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: fatal: invalid metadata block Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: bh = 86700288 (magic) Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: function = gfs_get_meta_buffer Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: file = /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.c, line = 1225 Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: time = 1278255737 Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: about to withdraw from the cluster Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: telling LM to withdraw Jul 5 03:02:21 Hercules kernel: GFS: fsid=FSC:files.0: withdrawn Jul 5 03:02:21 Hercules kernel: Jul 5 03:02:21 Hercules kernel: Call Trace: Jul 5 03:02:21 Hercules kernel: [<ffffffff8880a018>] :gfs:gfs_lm_withdraw+0xc4/0xd3 Jul 5 03:02:21 Hercules kernel: [<ffffffff8001538b>] sync_buffer+0x0/0x3f Jul 5 03:02:21 Hercules kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 Jul 5 03:02:21 Hercules kernel: [<ffffffff800a00e5>] wake_bit_function+0x0/0x23 Jul 5 03:02:21 Hercules kernel: [<ffffffff88821c97>] :gfs:gfs_meta_check_ii+0x32/0x3e Jul 5 03:02:21 Hercules kernel: [<ffffffff887f7717>] :gfs:gfs_get_meta_buffer+0x1d1/0x247 Jul 5 03:02:21 Hercules kernel: [<ffffffff88804193>] :gfs:gfs_copyin_dinode+0x1d/0x12f Jul 5 03:02:21 Hercules kernel: [<ffffffff88800d6e>] :gfs:gfs_glock_nq_init+0x16/0x2a Jul 5 03:02:21 Hercules kernel: [<ffffffff888043e3>] :gfs:inode_create+0x13e/0x1df Jul 5 03:02:21 Hercules kernel: [<ffffffff88804a5d>] :gfs:gfs_inode_get+0x9d/0xba Jul 5 03:02:21 Hercules kernel: [<ffffffff888053bb>] :gfs:gfs_lookupi+0x33d/0x3df Jul 5 03:02:21 Hercules kernel: [<ffffffff887fce57>] :gfs:ea_find_i+0x0/0x6b Jul 5 03:02:21 Hercules kernel: [<ffffffff888172af>] :gfs:gfs_lookup+0x363/0x41a Jul 5 03:02:21 Hercules kernel: [<ffffffff80025426>] igrab+0x25/0x34 Jul 5 03:02:21 Hercules kernel: [<ffffffff888055a0>] :gfs:gfs_iget+0x3d/0x1f1 Jul 5 03:02:21 Hercules kernel: [<ffffffff88801224>] :gfs:gfs_glock_dq+0x13c/0x14b Jul 5 03:02:21 Hercules kernel: [<ffffffff8000cf01>] do_lookup+0xe5/0x1e6 Jul 5 03:02:21 Hercules kernel: [<ffffffff8000a22b>] __link_path_walk+0xa01/0xf42 Jul 5 03:02:21 Hercules kernel: [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2 Jul 5 03:02:21 Hercules kernel: [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1 Jul 5 03:02:21 Hercules kernel: [<ffffffff80012752>] getname+0x15b/0x1c2 Jul 5 03:02:21 Hercules kernel: [<ffffffff800236ba>] __user_walk_fd+0x37/0x4c Jul 5 03:02:21 Hercules kernel: [<ffffffff8003f235>] vfs_lstat_fd+0x18/0x47 Jul 5 03:02:21 Hercules kernel: [<ffffffff8002a95a>] sys_newlstat+0x19/0x31 Jul 5 03:02:21 Hercules kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84 Jul 5 03:02:21 Hercules kernel: [<ffffffff8005d116>] system_call+0x7e/0x83 Thanks in advance, -- Abraham '''''''''''''''''''''''''''''''''''''''''''''''''''''' Abraham Alawi Unix/Linux Systems Administrator Science IT University of Auckland e: a.alawi@xxxxxxxxxxxxxx p: +64-9-373 7599, ext#: 87572 '''''''''''''''''''''''''''''''''''''''''''''''''''''' -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster