Re: kjournald panic in 2.4.20 RedHat 7.2

Michael Harris <mike@igconcepts.com> · Fri, 18 Apr 2003 01:20:21 -0500

Hi, I have the machine back online. /dev/sda4 (the partition that crashed) recovered
in about 2 seconds with the only e2fsck output being "recovering journal", so I am
running with it.

Here are more details on the machine:

[*ROOT* mofo /home/mgh 23 ] cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : AMD Athlon(tm) XP 1800+
stepping        : 2
cpu MHz         : 1534.037
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3060.53
[*ROOT* mofo /home/mgh 24 ] uname -a
Linux mofo 2.4.20 #14 Wed Mar 19 16:48:34 CST 2003 i686 unknown
[*ROOT* mofo /home/mgh 25 ] df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda1              8064272   4530996   3123624  60% /
/dev/hda3             29387900   1485288  26409772   6% /home
none                    127884         0    127884   0% /dev/shm
/dev/sda3            151195204 138014604   5500328  97% /mnt/sda3
/dev/sda4            193010776  75844724 107361584  42% /mnt/sda4
/dev/sda1             33032196  27801288   3552924  89% /mnt/sda1
[*ROOT* mofo /home/mgh 26 ] mount
/dev/hda1 on / type ext3 (rw)
none on /proc type proc (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/hda3 on /home type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/sda3 on /mnt/sda3 type ext2 (rw)
/dev/sda4 on /mnt/sda4 type ext3 (rw)
/dev/sda1 on /mnt/sda1 type ext2 (rw)
[*ROOT* mofo /home/mgh 27 ] cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  261910528 249167872 12742656        0 25636864 86761472
Swap: 1052827648 15437824 1037389824
MemTotal:       255772 kB
MemFree:         12444 kB
MemShared:           0 kB
Buffers:         25036 kB
Cached:          80408 kB
SwapCached:       4320 kB
Active:         133188 kB
Inactive:        91704 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       255772 kB
LowFree:         12444 kB
SwapTotal:     1028152 kB
SwapFree:      1013076 kB

[*ROOT* mofo /home/mgh 30 ] lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8367 [KT266]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8367 [KT266 AGP]
00:09.0 Communication controller: Cyclades Corporation PC300 TE 2 (rev 01)
00:0b.0 SCSI storage controller: Adaptec AIC-7881U
00:0d.0 Ethernet controller: Bridgecom, Inc: Unknown device 0985 (rev 11)
00:0f.0 Ethernet controller: Bridgecom, Inc: Unknown device 0985 (rev 11)
00:10.0 VGA compatible controller: Silicon Integrated Systems [SiS] 82C204 (rev 21)
00:11.0 ISA bridge: VIA Technologies, Inc.: Unknown device 3147
00:11.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:11.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 23)
00:11.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 23)

sda is an external Belkin RAID on an Adaptec 2940:

Apr 17 23:56:47 mofo kernel: SCSI subsystem driver Revision: 1.00
Apr 17 23:56:47 mofo kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
Apr 17 23:56:47 mofo kernel:         <Adaptec 2940 Ultra SCSI adapter>
Apr 17 23:56:47 mofo kernel:         aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
Apr 17 23:56:47 mofo kernel:
Apr 17 23:56:47 mofo kernel:   Vendor: BellStor  Model:                   Rev:
Apr 17 23:56:47 mofo kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02
Apr 17 23:56:47 mofo kernel: (scsi0:A:3): 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
Apr 17 23:56:47 mofo kernel: scsi0:A:3:0: Tagged Queuing enabled.  Depth 253
Apr 17 23:56:47 mofo kernel: Attached scsi disk sda at scsi0, channel 0, id 3, lun 0
Apr 17 23:56:47 mofo kernel: SCSI device sda: 1073723392 512-byte hdwr sectors (549746 MB)
Apr 17 23:56:48 mofo kernel:  sda: sda1 sda2 sda3 sda4

[*ROOT* mofo /usr/src 201 ] fdisk /dev/sda

The number of cylinders for this disk is set to 66836.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 255 heads, 63 sectors, 66836 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1      4178  33559753+  83  Linux
/dev/sda2          4179     23301 153605497+  83  Linux
/dev/sda3         23302     42424 153605497+  83  Linux
/dev/sda4         42425     66836 196089390   83  Linux

Command (m for help): q
[*ROOT* mofo /usr/src 202 ] cat /etc/fstab
LABEL=/                 /                       ext3    defaults        1 1
none                    /dev/pts                devpts  gid=5,mode=620  0 0
LABEL=/home             /home                   ext3    defaults        1 2
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
/dev/hda2               swap                    swap    defaults        0 0
/dev/fd0                /mnt/floppy             auto    noauto,owner,kudzu 0 0
/dev/sda1               /mnt/sda1               ext2    noauto 0 0
/dev/sda2               /mnt/sda2               ext2    noauto 0 0
/dev/sda3               /mnt/sda3               ext2    noauto 0 0
/dev/sda4               /mnt/sda4               ext3    noauto 0 0
/dev/cdrom              /mnt/cdrom              iso9660 noauto,owner,kudzu,ro 0 0

The kernel is a minimal 2.4.20 with the freeswan 1.99 patch applied. not that it could not be
related, but i have been running freeswan since 1999 on 40+ machines in various kernels without
any problem. I also applied the pc300-3.4.7 patch to support the Cyclades PC300 T1 card.
otherwise the kernel is as stripped down as I could make it.

CPU option is "(Athlon/Duron/K7) Processor family"
modules support disabled, everything compiled in statically

There are no scsi or other hardware errors surrounding the kjournald crash (or ever).
After kjournald crashed I could run df without it hanging, but an ls on /mnt/sda4 hung as did
all other processes hitting it (remote NT machines using Samba). killall -9 smbd never worked,
umount /mnt/sda4 reported busy. The load average jumped to about 30 during all this.
umount /mnt/sda1 worked but fsck showed it as uncleanly umounted though didnt find any
errors. i could not umount /dev/sda3 due to it being busy, but finally did a
umount -km /mnt/sda3 which killed my shell and I was unable to login thereafter.

Without thinking too much about it, I deleted the file being moved to sda4
when it crashed. Only maillog.MYD had copied over and it showed a size of about 270 MB.

As far as the error itself, it looks like /usr/src/linux/fs/jbd/transaction.c:1384 is:

   J_ASSERT (journal_current_handle() == handle)

in fcn journal_stop() though anyone this board is 90 steps ahead of me as to what
this aserts.

Also, rereading the ext3 FAQ at http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html
it looks like 2.4.16 and up should not require the patch. 

Any other information I can provide please ask and thanks for your help.

Mike

> Hi, If this is a redundant post I apologize. I am running 2.4.20 on what has been
> a very stable Athlon machine for months, tried to move a 2 GB file from an ext2
> partition to an ext3 and kjournald crashed. Here are the last reminants of my
> shell scrollback:
> 
> [*ROOT* mofo /mnt/sda1/mysql/fd 641 ] ll oldmail/
> total 2363288
> -rw-rw----    1 mysql    mysql    2147483647 Jan 23 18:04 maillog.MYD
> -rw-rw----    1 mysql    mysql    270138368 Jan 23 18:06 maillog.MYI
> -rw-rw----    1 mysql    mysql        8910 Mar 22  2002 maillog.frm
> [*ROOT* mofo /mnt/sda1/mysql/fd 642 ] df
> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/hda1              8064272   4529888   3124732  60% /
> /dev/hda3             29387900   1488316  26406744   6% /home
> none                    127884         0    127884   0% /dev/shm
> /dev/sda1             33032196  30162240   1191972  97% /mnt/sda1
> /dev/sda3            151195204 138014604   5500328  97% /mnt/sda3
> /dev/sda4            193010776  75750204 107456104  42% /mnt/sda4
> [*ROOT* mofo /mnt/sda1/mysql/fd 643 ] mv oldmail/* /mnt/sda4/mgh/oldmysqllogs/
> Segmentation fault
> [*ROOT* mofo /mnt/sda1/mysql/fd 644 ]
> Message from syslogd@mofo at Thu Apr 17 21:40:13 2003 ...
> mofo kernel: Assertion failure in journal_stop() at transaction.c:1384: "journal_current_handle() == handle"
> 
> [*ROOT* mofo /mnt/sda1/mysql/fd 644 ]
> [*ROOT* mofo /mnt/sda1/mysql/fd 644 ] fg
> 
> Anything accessing /mnt/sda4 hung at this point (smbd among others) and I could
> not cleanly shutdown the machine. Finally a umount -km /mnt/sda3 (not sda4) killed lots
> of procs, among them sshd and it is game over until a guy gets onsite to hit the reset button.
> 
> I cant access the machine at the moment but this looks like a hot list so I am
> posing what I can. It is an Athlon XP 2000+ with 256 MB DDR (no certain on speed,
> definitely an athlon XP) running strait 2.4.20 from the bz2 at ftp.kernel.org
> w/o module support compiled for Athlon, ext3 compiled in statically, and again this
> has been acting as a mysql server for months without a hitch. it is a redhat 7.2 dist
> with all the updates as of abotut one month ago installed, less the custom kernel.
> The file I was moving as you can see is a 2 GB file, ie. right at the limit of
> ext2 capacity, and I am wondering if this is the culprit.
> 
> Here is what was logged before I lost the machine:
> 
> Apr 17 21:40:13 mofo kernel: kernel BUG at transaction.c:1384!
> Apr 17 21:40:13 mofo kernel: invalid operand: 0000
> Apr 17 21:40:13 mofo kernel: CPU:    0
> Apr 17 21:40:13 mofo kernel: EIP:    0010:[journal_stop+108/560]    Not tainted
> Apr 17 21:40:13 mofo kernel: EIP:    0010:[<c0158eec>]    Not tainted
> Apr 17 21:40:13 mofo kernel: EFLAGS: 00010282
> Apr 17 21:40:13 mofo kernel: eax: 00000063   ebx: 00000001   ecx: 00000009   edx: c831bf44
> Apr 17 21:40:13 mofo kernel: esi: cdcc7a40   edi: c3739e80   ebp: ccd18ec0   esp: c69e9a00
> Apr 17 21:40:13 mofo kernel: ds: 0018   es: 0018   ss: 0018
> Apr 17 21:40:13 mofo kernel: Process mv (pid: 8133, stackpage=c69e9000)
> Apr 17 21:40:13 mofo kernel: Stack: c03250a0 c0320f67 c0320d18 00000568 c0327540 00000000 00000000 c3739e80
> Apr 17 21:40:13 mofo kernel:        cda5e900 c3739e80 c0152617 c3739e80 00000000 c0158935 cbc83930 00000000
> Apr 17 21:40:13 mofo kernel:        c313bc90 cdcc7a40 ca39fec0 ccd18ec0 cda5e900 cc283600 00000007 c013e3ce
> Apr 17 21:40:13 mofo kernel: Call Trace:    [ext3_dirty_inode+199/256] [journal_get_undo_access+245/288] [__mark_inode_dirty+46/144] [ext3_new_block+112/1936] [journal_cancel_revoke+251/368]
> Apr 17 21:40:13 mofo kernel: Call Trace:    [<c0152617>] [<c0158935>] [<c013e3ce>] [<c014d370>] [<c015ca9b>]
> Apr 17 21:40:13 mofo kernel:   [do_get_write_access+1183/1216] [journal_dirty_metadata+398/432] [ext3_do_update_inode+759/896] [ext3_do_update_inode+852/896] [ip_nat_fn+467/480] [ipt_hook+28/32]
> Apr 17 21:40:13 mofo kernel:   [<c015861f>] [<c0158c8e>] [<c0152117>] [<c0152174>] [<c02cfe53>] [<c02cfb2c>]
> Apr 17 21:40:13 mofo kernel:   [journal_cancel_revoke+251/368] [do_get_write_access+1183/1216] [tcp_packet+309/336] [journal_get_write_access+55/80] [journal_cancel_revoke+251/368] [do_get_write_access+1183/1216]
> Apr 17 21:40:13 mofo kernel:   [<c015ca9b>] [<c015861f>] [<c02cbf85>] [<c0158677>] [<c015ca9b>] [<c015861f>]
> Apr 17 21:40:13 mofo kernel:   [ext3_alloc_block+25/32] [ext3_alloc_branch+85/720] [getblk+40/96] [getblk+57/96] [bread+22/112] [ext3_do_update_inode+759/896]
> Apr 17 21:40:13 mofo kernel:   [<c014f649>] [<c014f965>] [<c012e778>] [<c012e789>] [<c012e9c6>] [<c0152117>]
> Apr 17 21:40:13 mofo kernel:   [ext3_do_update_inode+852/896] [do_get_write_access+1183/1216] [ext3_get_branch+83/208] [ext3_get_block_handle+437/688] [do_get_write_access+1183/1216] [create_buffers+97/240]
> Apr 17 21:40:13 mofo kernel:   [<c0152174>] [<c015861f>] [<c014f7d3>] [<c0150035>] [<c015861f>] [<c012ebd1>]
> Apr 17 21:40:13 mofo kernel:   [ext3_get_block+89/96] [__block_prepare_write+230/768] [__jbd_kmalloc+39/160] [block_prepare_write+29/64] [ext3_get_block+0/96] [ext3_prepare_write+124/288]
> Apr 17 21:40:13 mofo kernel:   [<c0150189>] [<c012f126>] [<c015e757>] [<c012f9ad>] [<c0150130>] [<c01505dc>]
> Apr 17 21:40:13 mofo kernel:   [ext3_get_block+0/96] [generic_file_write+1185/1760] [ext3_file_write+31/176] [sys_write+149/240] [schedule+786/832] [system_call+51/56]
> Apr 17 21:40:13 mofo kernel:   [<c0150130>] [<c0122b91>] [<c014e13f>] [<c012ce25>] [<c0110222>] [<c0106d83>]
> Apr 17 21:40:13 mofo kernel:
> Apr 17 21:40:13 mofo kernel: Code: 0f 0b 68 05 18 0d 32 c0 83 c4 14 f6 47 18 04 ba 01 00 00 00
> 
> Looking at http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html where i found the link to
> this list, it says to use ext3-0.0.7a.tar.bz2 which looks like a kernel patch, which I have not
> done. The kernel was compiled from the 2.4.20 dist with no ext3 patches. I did install
> e2fsprogs-1.32 but no kernel patches. If this is the issue, please just tell me I am an
> idiot and I will be gone. I am 99% sure this is not a hardware issue.
> 
> my first priority is getting the machine on its feet along with that partition, whose integrity
> i now question. Can I substitute ext2 for ext3 in fstab and mount it as ext2, after ext2 fscking
> it?
> 
> If you have a monent to spare any insight on this late good Thursday you are doing me a great favor,
> and maybe I have found a legitimate bug here. I should have hte machine online in 30 minutes
> if there is more info I can provide.
> 
> Thanks,
> Mike
> 
> 
> 
> _______________________________________________
> 
> Ext3-users@redhat.com
> https://listman.redhat.com/mailman/listinfo/ext3-users

_______________________________________________

Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users