> Hi Wendell, > > What's the status on this? I haven't heard back from you. > Did the metadata save finish? If so, where can I download it? > > Regards, > > Bob Peterson > Red Hat File Systems As Bob mentioned in a separate email, perhaps someone else can suggest what might be causing the complete loss of IP connectivity which *appears* to be related to the GFS2 issues. Even though that *should* not happen... Three physical servers with clvmd, Xen kernels, GFS, shared FC storage, multiple virtual machine instances. Each physical server mounts 3 fairly large GFS2 filesystems. Each one has 2 physical NICs and a virtual NIC. One physical NIC gets a 192.168.x.y private IP and connects to a good gigE switch for server-to-server clustering/gfs communications. The virtual NIC is only visible between a VM running on a physical host and that physical host it runs on. Each physical host server NFS shares those GFS filesystems and a number of the VMs mount those, via the virtual NIC that allows it to see the physical node it's running on only. Our first indication of a problem is generally loss of IP connectivity to one or more of the VMs. This morning the one that we first noticed couldn't be accessed was running on physical server3. We immediately started checking to see if we could see what was going on and noticed that doing an "ls" on server1 of one of the GFS filesystems gave an I/O error. Doing so against the 2nd GFS2 fielsystem hung. Both server2 and server3 appeared able to fully access those filesystems. The VM that lost IP connectivity was running on server3. So a strange incongruity there. Server3 appears healthy, nothing odd in logs, can access GFS filesystems just fine, but at about the time server1 starts failing on a GFS filesystem a VM running on server3 loses all IP connectivity. Has to be related but I'm just not sure how/why. Our last crash was around 9:30 this morning. Just now I noticed server2 has this in /var/log/messages (see below). But, as of right now I *can* access files on all 3 GFS2 mounted filesystems on server2 without issue it would appear. That's probably a sign that it's going to belly up here in a bit, maybe(?) Oh, one other thing, you mentioning needing to run gfs2_fsck *twice* to fix some problems Bob. I did not know that and did not run it twice on the first attempt to repair the filesystem I think is causing the grief there. If you don't complete changes to gfs2_fsck so as to not require two passes like that by this weekend would you recommend we umount and run it twice? Thanks. Aug 4 09:40:39 server2 ntpd[5125]: synchronized to 192.168.77.1, stratum 3 Aug 4 13:01:01 server2 kernel: kswapd0: page allocation failure. order:0, mode:0xd0 Aug 4 13:01:01 server2 kernel: Aug 4 13:01:01 server2 kernel: Call Trace: Aug 4 13:01:01 server2 kernel: [<ffffffff8020f6b0>] __alloc_pages+0x2b5/0x2ce Aug 4 13:01:01 server2 kernel: [<ffffffff8020c6e3>] do_generic_mapping_read+0x389/0x3de Aug 4 13:01:01 server2 kernel: [<ffffffff88554a7a>] :gfs2:gfs2_read_actor+0x0/0x7b Aug 4 13:01:01 server2 kernel: [<ffffffff8025dfb7>] cache_alloc_refill+0x267/0x4a9 Aug 4 13:01:01 server2 kernel: [<ffffffff8020afb1>] kmem_cache_alloc+0x50/0x6d Aug 4 13:01:01 server2 kernel: [<ffffffff8854a910>] :gfs2:gfs2_glock_get+0x9f/0x29d Aug 4 13:01:01 server2 kernel: [<ffffffff8855c5f5>] :gfs2:read_rindex_entry+0x317/0x32a Aug 4 13:01:01 server2 kernel: [<ffffffff8855c920>] :gfs2:gfs2_rindex_hold+0x101/0x153 Aug 4 13:01:01 server2 kernel: [<ffffffff80225cd7>] find_or_create_page+0x3c/0xab Aug 4 13:01:01 server2 kernel: [<ffffffff88543714>] :gfs2:do_strip+0xba/0x358 Aug 4 13:01:01 server2 kernel: [<ffffffff88551d20>] :gfs2:gfs2_meta_read+0x17/0x65 Aug 4 13:01:01 server2 kernel: [<ffffffff885427e2>] :gfs2:recursive_scan+0xf2/0x175 Aug 4 13:01:01 server2 kernel: [<ffffffff885428fe>] :gfs2:trunc_dealloc+0x99/0xe7 Aug 4 13:01:01 server2 kernel: [<ffffffff8854365a>] :gfs2:do_strip+0x0/0x358 Aug 4 13:01:01 server2 kernel: [<ffffffff80290000>] do_sysctl+0x18f/0x26e Aug 4 13:01:01 server2 kernel: [<ffffffff885588b9>] :gfs2:gfs2_delete_inode+0xe3/0x18d Aug 4 13:01:01 server2 kernel: [<ffffffff8855881c>] :gfs2:gfs2_delete_inode+0x46/0x18d Aug 4 13:01:01 server2 kernel: [<ffffffff885587d6>] :gfs2:gfs2_delete_inode+0x0/0x18d Aug 4 13:01:01 server2 kernel: [<ffffffff802303a5>] generic_delete_inode+0xc6/0x143 Aug 4 13:01:01 server2 kernel: [<ffffffff802d9566>] prune_one_dentry+0x4d/0x76 Aug 4 13:01:01 server2 kernel: [<ffffffff8022f865>] prune_dcache+0x10f/0x149 Aug 4 13:01:01 server2 kernel: [<ffffffff802d95a6>] shrink_dcache_memory+0x17/0x30 Aug 4 13:01:01 server2 kernel: [<ffffffff80240951>] shrink_slab+0xdc/0x154 Aug 4 13:01:01 server2 kernel: [<ffffffff802599c6>] kswapd+0x347/0x447 Aug 4 13:01:01 server2 kernel: [<ffffffff8026defe>] monotonic_clock+0x35/0x7b Aug 4 13:01:01 server2 kernel: [<ffffffff80299fe8>] autoremove_wake_function+0x0/0x2e Aug 4 13:01:01 server2 kernel: [<ffffffff80299dd0>] keventd_create_kthread+0x0/0xc4 Aug 4 13:01:01 server2 kernel: [<ffffffff8025967f>] kswapd+0x0/0x447 Aug 4 13:01:01 server2 kernel: [<ffffffff80299dd0>] keventd_create_kthread+0x0/0xc4 Aug 4 13:01:01 server2 kernel: [<ffffffff802334b4>] kthread+0xfe/0x132 Aug 4 13:01:01 server2 kernel: [<ffffffff8025fb2c>] child_rip+0xa/0x12 Aug 4 13:01:01 server2 kernel: [<ffffffff80299dd0>] keventd_create_kthread+0x0/0xc4 Aug 4 13:01:01 server2 kernel: [<ffffffff802333b6>] kthread+0x0/0x132 Aug 4 13:01:01 server2 kernel: [<ffffffff8025fb22>] child_rip+0x0/0x12 Aug 4 13:01:01 server2 kernel: Aug 4 13:01:01 server2 kernel: Mem-info: Aug 4 13:01:01 server2 kernel: DMA per-cpu: Aug 4 13:01:01 server2 kernel: cpu 0 hot: high 186, batch 31 used:0 Aug 4 13:01:01 server2 kernel: cpu 0 cold: high 62, batch 15 used:48 Aug 4 13:01:01 server2 kernel: cpu 1 hot: high 186, batch 31 used:21 Aug 4 13:01:01 server2 kernel: cpu 1 cold: high 62, batch 15 used:8 Aug 4 13:01:01 server2 kernel: cpu 2 hot: high 186, batch 31 used:176 Aug 4 13:01:01 server2 kernel: cpu 2 cold: high 62, batch 15 used:9 Aug 4 13:01:01 server2 kernel: cpu 3 hot: high 186, batch 31 used:128 Aug 4 13:01:01 server2 kernel: cpu 3 cold: high 62, batch 15 used:3 Aug 4 13:01:01 server2 kernel: cpu 4 hot: high 186, batch 31 used:156 Aug 4 13:01:01 server2 kernel: cpu 4 cold: high 62, batch 15 used:10 Aug 4 13:01:01 server2 kernel: cpu 5 hot: high 186, batch 31 used:28 Aug 4 13:01:01 server2 kernel: cpu 5 cold: high 62, batch 15 used:50 Aug 4 13:01:01 server2 kernel: cpu 6 hot: high 186, batch 31 used:157 Aug 4 13:01:01 server2 kernel: cpu 6 cold: high 62, batch 15 used:9 Aug 4 13:01:01 server2 kernel: cpu 7 hot: high 186, batch 31 used:13 Aug 4 13:01:01 server2 kernel: cpu 7 cold: high 62, batch 15 used:2 Aug 4 13:01:01 server2 kernel: DMA32 per-cpu: empty Aug 4 13:01:01 server2 kernel: Normal per-cpu: empty Aug 4 13:01:01 server2 kernel: HighMem per-cpu: empty Aug 4 13:01:01 server2 kernel: Free pages: 0kB (0kB HighMem) Aug 4 13:01:01 server2 kernel: Active:131632 inactive:475042 dirty:89 writeback:0 unstable:0 free:0 slab:129221 mapped-file:31572 mapped-anon:75077 pagetables:4941 Aug 4 13:01:01 server2 kernel: DMA free:0kB min:7100kB low:8872kB high:10648kB active:526528kB inactive:1900168kB present:3153920kB pages_scanned:0 all_unreclaimable? no Aug 4 13:01:01 server2 kernel: lowmem_reserve[]: 0 0 0 0 Aug 4 13:01:01 server2 kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Aug 4 13:01:01 server2 kernel: lowmem_reserve[]: 0 0 0 0 Aug 4 13:01:01 server2 kernel: Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Aug 4 13:01:01 server2 kernel: lowmem_reserve[]: 0 0 0 0 Aug 4 13:01:01 server2 kernel: HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Aug 4 13:01:01 server2 kernel: lowmem_reserve[]: 0 0 0 0 Aug 4 13:01:01 server2 kernel: DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Aug 4 13:01:01 server2 kernel: DMA32: empty Aug 4 13:01:01 server2 kernel: Normal: empty Aug 4 13:01:01 server2 kernel: HighMem: empty Aug 4 13:01:01 server2 kernel: 531563 pagecache pages Aug 4 13:01:01 server2 kernel: Swap cache: add 2, delete 0, find 0/0, race 0+0 Aug 4 13:01:01 server2 kernel: Free swap = 16777200kB Aug 4 13:01:01 server2 kernel: Total swap = 16777208kB Aug 4 13:01:01 server2 kernel: Free swap: 16777200kB Aug 4 13:01:01 server2 kernel: 788480 pages of RAM Aug 4 13:01:01 server2 kernel: 35847 reserved pages Aug 4 13:01:01 server2 kernel: 123018 pages shared Aug 4 13:01:01 server2 kernel: 2 pages swap cached Aug 4 13:01:01 server2 kernel: GFS2: fsid=Peace:raid2.0: gfs2_delete_inode: -12
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster