Hi,
recently following appeared in dmesg during test with very intensive,
mostly unaligned, sometimes overlapping IO (test of course failed):
GFS2: fsid=cluster:stg0.0: fatal: invalid metadata block
GFS2: fsid=cluster:stg0.0: bh = 404971163 (magic number)
GFS2: fsid=cluster:stg0.0: function = gfs2_meta_indirect_buffer, file
= fs/gfs2/meta_io.c, line = 365
GFS2: fsid=cluster:stg0.0: about to withdraw this file system
GFS2: fsid=cluster:stg0.0: dirty_inode: glock -5
GFS2: fsid=cluster:stg0.0: dirty_inode: glock -5
GFS2: fsid=cluster:stg0.0: dirty_inode: glock -5
GFS2: fsid=cluster:stg0.0: telling LM to unmount
GFS2: fsid=cluster:stg0.0: withdrawn
Pid: 135440, comm: test Not tainted 2.6.32-504.30.3.el6.x86_64 #1
Call Trace:
[<ffffffffa047eab8>] ? gfs2_lm_withdraw+0x128/0x160 [gfs2]
[<ffffffff8109eca0>] ? wake_bit_function+0x0/0x50
[<ffffffffa047ec15>] ? gfs2_meta_check_ii+0x45/0x50 [gfs2]
[<ffffffffa0468c69>] ? gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
[<ffffffffa0452caa>] ? gfs2_block_map+0x2aa/0xf10 [gfs2]
[<ffffffff81014a29>] ? read_tsc+0x9/0x20
[<ffffffff810aab71>] ? ktime_get_ts+0xb1/0xf0
[<ffffffff810f29e9>] ? delayacct_end+0x89/0xa0
[<ffffffff811c5310>] ? sync_buffer+0x0/0x50
[<ffffffff81181565>] ? mem_cgroup_charge_common+0xa5/0xd0
[<ffffffff811cf790>] ? do_mpage_readpage+0x150/0x5f0
[<ffffffff8114701e>] ? __inc_zone_page_state+0x2e/0x30
[<ffffffff8113b4e0>] ? __lru_cache_add+0x40/0x90
[<ffffffff811cfd89>] ? mpage_readpages+0xe9/0x130
[<ffffffffa0452a00>] ? gfs2_block_map+0x0/0xf10 [gfs2]
[<ffffffffa045e0cf>] ? gfs2_holder_wake+0x1f/0x30 [gfs2]
[<ffffffffa0452a00>] ? gfs2_block_map+0x0/0xf10 [gfs2]
[<ffffffffa045f0c5>] ? gfs2_glock_wait+0x25/0x90 [gfs2]
[<ffffffffa0462619>] ? gfs2_glock_nq+0x2c9/0x410 [gfs2]
[<ffffffffa046a356>] ? gfs2_readpages+0xc6/0xd0 [gfs2]
[<ffffffffa046a2e8>] ? gfs2_readpages+0x58/0xd0 [gfs2]
[<ffffffff8113a165>] ? __do_page_cache_readahead+0x185/0x210
[<ffffffff8129156d>] ? radix_tree_prev_hole+0x4d/0x60
[<ffffffff8113a53f>] ? ondemand_readahead+0xcf/0x240
[<ffffffff8113a7a3>] ? page_cache_sync_readahead+0x33/0x50
[<ffffffff81126208>] ? generic_file_aio_read+0x558/0x700
[<ffffffff8109ec0f>] ? wake_up_bit+0x2f/0x40
[<ffffffff8118e1ea>] ? do_sync_read+0xfa/0x140
[<ffffffffa045fc4e>] ? gfs2_glock_dq_uninit+0x1e/0x30 [gfs2]
[<ffffffff8109ec20>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81193cd4>] ? cp_new_stat+0xe4/0x100
[<ffffffff8122da86>] ? security_file_permission+0x16/0x20
[<ffffffff8118eba5>] ? vfs_read+0xb5/0x1a0
[<ffffffff8118eed2>] ? sys_pread64+0x82/0xa0
[<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
What could be the reason for that?
Kernel is 2.6.32-504.30.3.el6.x86_64
Filesystem is mounted on one node only.
Hardware is Dell R630 with PERC H730P Mini (MegaRAID SAS-3 3108
[Invader] rev 02) and six-drive RAID-5 on Intel SSDSC2BX40 (S3610 400GB)
SSDs. Both drive WB cache (capacitor-based) and controller WB cache
(battery-backed) are turned on.
I use home-brew gfs_controld version (on top of corosync2). Tail of dump is:
...
1447274995 stg0 receive_mount_done from 1 result 0
1447274995 stg0 wait_recoveries done
1447682342 uevent offline gfs2 /fs/gfs2/cluster:stg0
1447682342 withdraw: stg0
1447682342 stg0 run_dmsetup_suspend 253:4
1447682343 stg0 dmsetup_suspend_done result 0
1447682343 dmsetup_wait off
1447682343 stg0 receive_withdraw from 1
1447682343 stg0 wait_recoveries done
1447682343 gfs:mount:stg0 conf 0 0 1 memb join left 1
1447682343 stg0 confchg for our leave
1447682343 stg0 set /sys/fs/gfs2/cluster:stg0/lock_module/withdraw to 1
1447682343 cpg_dispatch error 9
df values are:
blocks: 1.9T 82G 1.8T 5%
inodes: 445M 38K 445M 1%
Is there any additional information I should provide while system is not
powered down and filesystem is not unmounted?
Best,
Vladislav
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster