metadata_csum Oops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The machine I reported earlier issues has not glitched yet.
However, I have some more fun to report.

To further test metadata checksums, I enabled it on my desktop machine.
(Also has SSE4.2, so the checksum overhead should be minimal.)  This is
a Debian/unstable lachine, with 64-bit kernel (v3.5 + ext4-for-linus)
and 32-bit userland, with e2fsprogs from the next branch.

This time I included the root FS, which gave some interesting issues
last night during the network backup run.  This morning I was greeted by:

BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff810ffbf9>] ext4_readdir+0x1e2/0x5a8
PGD 1589067 PUD 158a067 PMD 0
Oops: 0000 [#1] SMP
CPU 1
Modules linked in:  battery nfds exportfs deflate zlib_deflate zlib_inflate ctr <whole bunch of crypto modules snipped> crypto_null af_key xfrm_algo fuse ftdi_sio usbserial r8199
Pid: 31650, comm: rsync Not tainted 3.5.0-00032-g0e1gf37 #55 Gigabyte Technology Co., Ltd. H55M-UD2H/H55M-UD2H
RIP: 0010:[<ffffffff810ffbf9>]  [<ffffffff810ffbf9>] ext4_readdir+0x1e2/0x5a8
RSP: 0018:ffff88010eb65e38  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800b6763300 RCX: 0000000000000000
RDX: ffffffff810e5d99 RSI: ffff88010eb65f40 RDI: ffff8800b6763300
RBP: ffff88010eb65ed8 R08: 0000000000013750 R09: ffffea000445bd40
R10: 0000000000000000 R11: ffffffff8111a578 R12: ffff880108808740
R13: ffff88003ab13748 R14: ffff8801137e8400 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880117c80000(0063) knlGS:00000000f760d6c0
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: fffffffffffffff8 CR3: 000000011177e000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rsync (pid: 31650, threadinfo ffff88010eb64000, task ffff880111d6a5e0)
Stack:
 ffff88010eb65f08 ffffffff810bc5e9 ffff880113782220 ffff880000000000
 00000005756e69e4 ffff88011379501e ffffffff810e5d99 ffff88010eb65f40 
 ffff88003ab13748 ffff88003ab13748 0000000000000000 ffffffff813dc01c
Call Trace:
 [<ffffffff810bc5e9>] ? do_filp_open+0x33/0x81
 [<ffffffff810e5d99>] ? compat_filldir+0xdd/0xdd
 [<ffffffff813dc01c>] ? _cond_resched+0x9/0x1d
 [<ffffffff8104437b>] ? shoud_resched+0x9/0x28
 [<ffffffff810e5d99>] ? compat_filldir+0xdd/0xdd
 [<ffffffff810be52e>] vfs_readdir+0x61/0x9a
 [<ffffffff810aa3b5>] ? kmem_cache_free+0x15/0x6e
 [<ffffffff810e729e>] compat_sys_getdents64+0x72/0xcc
 [<ffffffff813de59b>] sysenter_dispatch+0x7/0x1e
Code: 00 83 f8 00 0f 8c a6 00 00 00 75 02 eb 6d 4c 89 e7 e8 49 5b 09 00 49 89 44 24 08 49 8b 4c 24 08 48 8b 55 90 48 89 df 48 8b 75 98 <8b> 41 f8 41 89 44 24 20 8b 41 fc 48 83 e9 08 41 89 44 24 24 e8
RIP  [<ffffffff810ffbf9>] ext4_readdir+0x1e2/0x5a8
 RSP <ffff88010eb65e38>
CR2: fffffffffffffff8

(The above was hand-transcribed, so I *hope* I got all the hex correct!)

Anyway, on reboot, the system came up, but would not fsck, printing:

fsck.ext4: Superblock checksum does not match superblock while trying to open /dev/sda2
/dev/sda2:
The superblock could not be read or does not describe a correct ext2
filesstem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

fsck died with exit code 8


So I logged in by hand and tried to run the recommended command, but!
Including "-b 8193" produced the above message, while *omitting* it
actually ran successfully.  I'm a little confused by that.
(The "fsck" wrapper invoked from /etc/init.d/checkroot.sh is 2.20.1-5.1.)

For some limited values of "successfully"; it sure found a lot of problems:

(This is the second run; I stopped the first and started capturing it
when it was obvious pencil and paper was impractical.)

Script started on Wed Aug  8 08:40:45 2012
/run# e2fsck -y -v -C0 /dev/sda2
e2fsck 1.43-WIP (1-Aug-2012)
root was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
... some similar errors from earlier run omitted ...
Inode 947580 checksum does not match inode.  Clear? yes
Inode 947581 checksum does not match inode.  Clear? yes
Inode 947582 checksum does not match inode.  Clear? yes
Inode 947583 checksum does not match inode.  Clear? yes
Inode 947584 checksum does not match inode.  Clear? yes
Inode 947585 checksum does not match inode.  Clear? yes
Inode 947586 checksum does not match inode.  Clear? yes
Inode 947587 checksum does not match inode.  Clear? yes
Inode 947588 checksum does not match inode.  Clear? yes
Inode 947589 checksum does not match inode.  Clear? yes
Inode 947590 checksum does not match inode.  Clear? yes
Inode 947591 checksum does not match inode.  Clear? yes
... 315 additional deleted ...
Inode 947920 checksum does not match inode.  Clear? yes

Pass 2: Checking directory structure
Entry 'linux' in /usr/arm-linux-gnueabi/include (534036) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534570) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534582) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534660) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534681) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534695) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534737) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534785) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534796) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534805) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534820) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534824) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534860) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534911) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534947) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534959) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (534978) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (535012) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (535026) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (535045) has deleted/unused inode 534518.  Clear? yes
Entry '..' in ??? (535065) has deleted/unused inode 534518.  Clear? yes

Pass 3: Checking directory connectivity
Unconnected directory inode 534570 (...)
Connect to /lost+found? yes
... suplicates snipped ...

Pass 4: Checking reference counts
Inode 534036 ref count is 30, should be 29.  Fix? yes
Inode 534570 ref count is 3, should be 2.  Fix? yes
Inode 534582 ref count is 4, should be 3.  Fix? yes
Inode 534660 ref count is 3, should be 2.  Fix? yes
Inode 534681 ref count is 3, should be 2.  Fix? yes
Inode 534695 ref count is 3, should be 2.  Fix? yes
Inode 534737 ref count is 3, should be 2.  Fix? yes
Inode 534785 ref count is 3, should be 2.  Fix? yes
Unattached inode 534795  Connect to /lost+found? yes
Inode 534795 ref count is 2, should be 1.  Fix? yes
Inode 534796 ref count is 3, should be 2.  Fix? yes
Unattached inode 534797  Connect to /lost+found? yes
Inode 534797 ref count is 2, should be 1.  Fix? yes
... snip ...
Unattached inode 542215  Connect to /lost+found? yes
Inode 542215 ref count is 2, should be 1.  Fix? yes

Pass 5: Checking group summary information
Block bitmap differences:  -(3833858--3833859) -3833862 -(3833864--3833865) -3833867 -3833870 -(3833872--3833873)  ...2.5 MB snipped...
Fix? yes

Free blocks count wrong for group #0 (27905, counted=27767).
Fix? yes
Free blocks count wrong for group #1 (2607, counted=2109).
Fix? yes
Free blocks count wrong for group #2 (4062, counted=4164).
Fix? yes
... snip ...
Free blocks count wrong for group #289 (21421, counted=21608).
Fix? yes
Free blocks count wrong (6043152, counted=6026661).
Fix? yes

Inode bitmap differences:  -(26241--26243) -(26245--26247) -26253 -26263 -26268 -(26271--26272) -(26275--26276) -26278 ... 1,000,000 bytes snipped ...
Fix? yes

Free inodes count wrong for group #0 (2, counted=31).
Fix? yes
Free inodes count wrong for group #1 (0, counted=934).
Fix? yes
Free inodes count wrong for group #2 (0, counted=1847).
Fix? yes
... snip ...
Free inodes count wrong for group #288 (486, counted=468).
Fix? yes

Free inodes count wrong (693942, counted=693943).
Fix? yes

root: ***** FILE SYSTEM WAS MODIFIED *****
root: ***** REBOOT LINUX *****

      286777 inodes used (29.24%, out of 980720)
         278 non-contiguous files (0.1%)
         172 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 262385/65
     3738850 blocks used (38.29%, out of 9765511)
           0 bad blocks
           0 large files

      238864 regular files
       21703 directories
         164 character device files
          10 block device files
           1 fifo
  4294967292 links
       26014 symbolic links (24132 fast symbolic links)
          12 sockets
------------
      286397 files

/run# e2fsck -f -v -C0 /dev/sda2
e2fsck 1.43-WIP (1-Aug-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

      286777 inodes used (29.24%, out of 980720)
         278 non-contiguous files (0.1%)
         173 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 262385/65
     3738850 blocks used (38.29%, out of 9765511)
           0 bad blocks
           0 large files

      238864 regular files
       21703 directories
         164 character device files
          10 block device files
           1 fifo
          36 links
       26014 symbolic links (24132 fast symbolic links)
          12 sockets
------------
      286804 files
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux