GFS journaling related crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



GFS experts,
 
I'm looking for help to debug this crash. I'm running GFS with low journal size. I understand this is bit unchartered territory for GFS, but we have a need to be really stingy on journal overhead.
 
GFS is consistently crashing with same stack trace, seems related to journal log being committed to the disk. gfs_tool counters show log-space-used close to 100%. iostat shows a huge write just before the crash. Probably the journal blocks being flushed to disk?
 
I tried the workaround to reduce "max_atomic_write" value listed in https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=146672. But no luck.
 
Could this be a GFS bug that gets exercised only on small journal size?
 
Full stack trace, gfs counters and iostat data listed below. Appreciate any help!
 
- Sridharan
Cisco Systems, Inc
 
---------------------
 
Logs:

Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=1: Looking at journal...
Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=1: Done
Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=2: Trying to acquire journal lock...
Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=2: Looking at journal...
Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=2: Done
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: head_off = 61856, head_wrap = 8
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: dump_off = 61856, dump_wrap = 7
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: assertion "FALSE" failed
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0:   function = check_seg_usage
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0:   file = /download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/log.c, line = 590
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0:   time = 1171482110
Feb 14 11:41:50 cfs1 kernel: ------------[ cut here ]------------
Feb 14 11:41:50 cfs1 kernel: kernel BUG at /download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/util.c:211!
Feb 14 11:41:50 cfs1 kernel: invalid operand: 0000 [#1]
Feb 14 11:41:50 cfs1 kernel: SMP
Feb 14 11:41:50 cfs1 kernel: Modules linked in: lock_dlm(U) dlm(U) gfs(U) lock_harness(U) cman(U) nfsd exp
ortfs nfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod bu
tton battery ac uhci_hcd ehci_hcd hw_random e1000 e100 mii floppy ext3 jbd
Feb 14 11:41:50 cfs1 kernel: CPU:    0
Feb 14 11:41:50 cfs1 kernel: EIP:    0060:[<e10864f6>]    Tainted: GF     VLI
Feb 14 11:41:50 cfs1 kernel: EFLAGS: 00010246   (2.6.9-42.7.ELsmp)
Feb 14 11:41:50 cfs1 kernel: EIP is at gfs_assert_i+0x38/0x69 [gfs]
Feb 14 11:41:50 cfs1 kernel: eax: 000000f8   ebx: e0c9e000   ecx: c7cb3b7c   edx: e108d0a2
Feb 14 11:41:50 cfs1 kernel: esi: e108702a   edi: e108a021   ebp: 0000024e   esp: c7cb3b78
Feb 14 11:41:50 cfs1 kernel: ds: 007b   es: 007b   ss: 0068
Feb 14 11:41:50 cfs1 kernel: Process bonnie++ (pid: 8091, threadinfo=c7cb3000 task=df3c43b0)
Feb 14 11:41:50 cfs1 kernel: Stack: e108d0a2 e0cc2734 e108a021 e0cc2734 e108702a e0cc2734 e1089e34 0000024e
Feb 14 11:41:50 cfs1 kernel:        e0cc2734 45d365fe 00000000 0000f190 c48b9080 e0c9e000 e106e880 e1089e34
Feb 14 11:41:50 cfs1 kernel:        e0cc2734 45d365fe 00000000 0000f190 c48b9080 e0c9e000 e106e880 e1089e34
Feb 14 11:41:50 cfs1 kernel:        0000024e 00000007 00000000 0000f1a0 00000000 0000f1c0 00000000 00000008
Feb 14 11:41:50 cfs1 kernel:        0000024e 00000007 00000000 0000f1a0 00000000 0000f1c0 00000000 00000008
Feb 14 11:41:50 cfs1 kernel: Call Trace:
Feb 14 11:41:50 cfs1 kernel:  [<e106e880>] check_seg_usage+0x197/0x19f [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106ea03>] sync_trans+0x143/0x1b1 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e1071641>] quota_trans_size+0x20/0x36 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106ebdd>] disk_commit+0xec/0x264 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106edf6>] log_refund+0x61/0x187 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106f235>] log_flush_internal+0xec/0x19e [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106e131>] gfs_log_reserve+0x19e/0x20e [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e1063f77>] glock_wait_internal+0x1e3/0x1ef [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e10642e1>] gfs_glock_nq+0xe3/0x116 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e1085334>] gfs_trans_begin_i+0xfd/0x15a [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106842e>] inode_init_and_link+0x1fe/0x388 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e10648a3>] gfs_glock_nq_init+0x13/0x26 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e10648f4>] gfs_glock_nq_num+0x2e/0x71 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e1068767>] gfs_createi+0x1af/0x1f1 [gfs]
Feb 14 11:41:51 cfs1 kernel:  [<e107a9d0>] gfs_create+0x68/0x16f [gfs]
Feb 14 11:41:51 cfs1 kernel:  [<c0167bd0>] vfs_create+0xbc/0x103
Feb 14 11:41:51 cfs1 kernel:  [<c0167fa8>] open_namei+0x177/0x579
Feb 14 11:41:51 cfs1 kernel:  [<c015a442>] filp_open+0x45/0x70
Feb 14 11:41:51 cfs1 kernel:  [<c02d2e66>] __cond_resched+0x14/0x39
Feb 14 11:41:51 cfs1 kernel:  [<c01c3326>] direct_strncpy_from_user+0x3e/0x5d
Feb 14 11:41:51 cfs1 kernel:  [<c015a79e>] sys_open+0x31/0x7d
Feb 14 11:41:51 cfs1 kernel:  [<c015a7fc>] sys_creat+0x12/0x16
Feb 14 11:41:51 cfs1 kernel:  [<c02d48bb>] syscall_call+0x7/0xb
Feb 14 11:41:51 cfs1 kernel:  [<c02d007b>] packet_rcv+0x8e/0x307
Feb 14 11:41:51 cfs1 kernel: Code: c3 83 b8 30 02 00 00 00 74 2c ff 35 90 7d 43 c0 8d 80 34 47 02 00 50 55
 ff 74 24 20 50 51 50 52 50 68 a2 d0 08 e1 e8 ac c3 09 df <0f> 0b d3 00 25 d1 08 e1 83 c4 28 e8 9d f8 07 d
f ff 35 90 7d 43
Feb 14 11:41:51 cfs1 kernel:  <0>Fatal exception: panic in 5 seconds
 
[root@cfs1 ~]$ gfs_tool counters /mnt/gfs2
 
                                  locks 11451
                             locks held 11450
                           freeze count 0
                          incore inodes 5713
                       metadata buffers 5838
                        unlinked inodes 1
                              quota IDs 2
                     incore log buffers 893
                         log space used 98.41%
              meta header cache entries 53
                     glock dependencies 0
                 glocks on reclaim list 0
                              log wraps 7
                   outstanding LM calls 0
                  outstanding BIO calls 0
                       fh2dentry misses 0
                       glocks reclaimed 9
                         glock nq calls 131554
                         glock dq calls 125830
                   glock prefetch calls 0
                          lm_lock calls 11484
                        lm_unlock calls 12
                           lm callbacks 11496
                     address operations 0
                      dentry operations 4
                      export operations 0
                        file operations 17177
                       inode operations 45696
                       super operations 45690
                          vm operations 0
                        block I/O reads 52
                       block I/O writes 17269
 
 
[root@cfs1 ~]$ iostat 2 1000 | grep hda12
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
hda12             0.42         1.52         1.84       2242       2720
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             2.51         0.00        20.10          0         40
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12          4467.68      0.00     35741.41      0      70768
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux