ext3 bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I have a severe filesystem crash on a machine with a single ext3 filesystem.
It is the second time that this happens to me (on two different machines that
are identical in hardware setup and mostly in software - I cloned them when
I started to work).

It looks like the journal (or something else, i am not an expert here) has
gone wild and overwritten loads of inodes. I have a single filesys for every-
thing (I know, a bad habit), which has been remounted read-only. Using a ssh
tunnel that is still open, I had a look at the fs and found this:

root@wally /]# ls -l sbin/
ls: sbin/ipfwadm: Eingabe-/Ausgabefehler
ls: sbin/depmod.modutils: Eingabe-/Ausgabefehler
ls: sbin/update-modules.modutils: Eingabe-/Ausgabefehler
ls: sbin/modprobe.Lmodutils: Eingabe-/Ausgabefehler
ls: sbin/mount.smbfs: Eingabe-/Ausgabefehler
ls: sbin/mount.smb: Eingabe-/Ausgabefehler
insgesamt 1075231521
-rwxr-xr-x    1 root     root        14408 22. Mï 2002  badblocks
-rwxr-xr-x    1 root     root         7360 31. Mai 2003  blockdev
-rwxr-xr-x    1 root     root        47944 31. Mai 2003  cfdisk
[... normal entries salvaged]
-rwxr-xr-x    1 root     root          413 29. Mai 2002  fsck.nfs
lrwxrwxrwx    1 root     root            7 20. Feb 2003  fsck.vfat -> dosfsck
?rwxrw-rwt  13639 21845    18754          23 27. Jï 1994  genksyms
-rwxr-xr-x    1 root     root        14184 31. Mai 2003  getty
-rwxr-xr-x    1 root     root       121032 16. Apr 2003  grub
-rwxr-xr-x    1 root     root         2944 16. Apr 2003  grub-floppy
-rwxr-xr-x    1 root     root        12110 16. Apr 2003  grub-install
-rwxr-xr-x    1 root     root         2301 16. Apr 2003  grub-md5-crypt
-rwxr-xr-x    1 root     root         2473 16. Apr 2003  grub-terminfo
-rwxr-xr-x    1 root     root         9104 29. Mai 2002  halt
?-----xr-x  2240 1931476992 1952802008 1700749935  1. Jï 1970  hciattach
cr-Srwsr-T  2240 487270   25816      5,  16  1. Jï 1970  hciconfig
-rwxr-xr-x    1 root     root        27048 10. Apr 2003  hcid
[...]
-rwxr-xr-x    1 root     root        14092 24. Nov 2001  iptunnel
lrwxrwxrwx    1 root     root           15 23. Jul 08:53 kallsyms -> insmod.modutils
--wsrwxrw-  16406 3257942038 1075218782 4618131408204237070 27. Jï 2004  kallsyms.modutils
-rwxr-xr-x    1 root     root         5776 12. Nov 2001  kbdrate
?rwxrw-rwt  16896 21845    12079          26 27. Jï 1994  kernelversion
-rwxr-xr-x    1 root     root         8948 29. Mai 2002  killall5
-rwxr-xr-x    1 root     root        19672  3. Jï 2002  klogd
lrwxrwxrwx    1 root     root           15 23. Jul 08:53 ksyms -> insmod.modutils
sr-Srw----  16407 3423617046 1075238984 1075656696 27. Jï 2004  ksyms.modutils
-rwxr-xr-x    1 root     root       430636  8. Apr 2003  ldconfig
-rwxr-xr-x    1 root     root        20600 31. Mai 2003  losetup
lrwxrwxrwx    1 root     root           10 23. Jul 08:53 lsmod -> /bin/lsmod
?-wxrwSrwx  2063 21845    36537          73  1. Jï 1970  lsmod.Lmodutils
s--x--S---  16406 131072   10376          13  1. Jï 1970  lsmod.modutils
--wxrwsrwT  16406 2903523350 1074366302 4614300640973588238 27. Jï 2004  lspci
-rwxr-xr-x    1 root     root        43485  3. Mï 2002  MAKEDEV
-rwxr-xr-x    1 root     root        10188 24. Nov 2001  mii-tool
[...]
lrwxrwxrwx    1 root     root            7 20. Feb 2003  mkfs.msdos -> mkdosfs
lrwxrwxrwx    1 root     root            7 20. Feb 2003  mkfs.vfat -> mkdosfs
?-ws------  16407 85999638 1075773488 1075233008 28. Jï 2004  mkswap
-rwxr-xr-x    1 root     root         8752 10. Jul 2003  modinfo
-rwxr-xr-x    1 root     root        39912 10. Apr 2003  modinfo.modutils
-rwxr-xr-x    1 root     root        19796 10. Jul 2003  modprobe
lrwx---rwT  2053 2209351685 1074963494 134587350 23. Jï 2004  modprobe.Lmodutils
lrwxrwxrwx    1 root     root           15  4. Apr 2003  modprobe.modutils -> insmod.modutils
-rwxr-xr-x    1 root     root         7108 24. Nov 2001  nameif
-rwxr-xr-x    1 root     root         2808 31. Mai 2003  pivot_root
-rwxr-xr-x    1 root     root         4716 24. Nov 2001  plipconfig
-rwxr-xr-x    1 root     root         3324 18. Mï 2001  pmap_dump
-rwxr-xr-x    1 root     root         3372 18. Mï 2001  pmap_set
-rwxr-xr-x    1 root     root        11692 18. Mï 2001  portmap
?-ws--s--T  16409 694698007 1075434216 1075242124 31. Jï 2004  poweroff
-rwxr-xr-x    1 root     root        18860 24. Nov 2001  rarp
-rwxr-xr-x    1 root     root         5088 31. Mai 2003  raw
?rwSrwx--T  16406 113786907 1075297776 1075751024 27. Jï 2004  reboot
-rwxr-xr-x    1 root     root        19056 22. Mï 2002  resize2fs
-rwxr-xr-x    1 root     root         7812 10. Jul 2003  rmmod
lrwxrwxrwx    1 root     root            6 16. Apr 2003  rmmod.Lmodutils -> insmod
lrwxrwxrwx    1 root     root           15  4. Apr 2003  rmmod.modutils -> insmod.modutils
?--xr----T  16408 276316184 1075252624 1075773640 27. Jï 2004  rmt
-rwxr-xr-x    1 root     root        42060 24. Nov 2001  route
-rwxr-xr-x    1 root     root         3196  9. Jul 2003  rpc.lockd
[..]
-rwxr-xr-x    1 root     root       153792  2. Apr 2002  tc
?rws--S---  16413 3167764512 1075212976 1075238948  2. Feb 2004  telinit
-rwxr-xr-x    1 root     root        22312 22. Mï 2002  tune2fs
-rwsr-xr-x    1 root     root        14508 21. Jï 2002  unix_chkpwd
-rwxr-xr-x    1 root     root        16560 16. Apr 2003  update-grub
-rwxr-xr-x    1 root     root         2807 10. Jul 2003  update-modules

This can be found throughout the fs - /lib, /usr/bin, /etc and others all are a mess.
Luckily I could get hold of the database files and the web content that my
students were working on...

I suspect the reason for the crash to correspond with large files: I had a
2.6 Gb file in /tmp (a gzipped partition) that I copied over to another machine, and
that night it crashed... It ran fine the last year, though.

A similar incident happened to my first machine, where I moved a hd image to an
external firewire disk. After deleting it from my machine, I was stranded with a
totally broken system that I could only get up again with a boot cd, and after an
fs check I now still have 16000 files in /lost+found (around 1.5 gig of data...)

The hardware on the machine looks like this:
00:00.0 Host bridge: Intel Corp. 82850 850 (Tehama) Chipset Host Bridge (MCH) (rev 04)
00:01.0 PCI bridge: Intel Corp. 82850 850 (Tehama) Chipset AGP Bridge (rev 04)
00:1e.0 PCI bridge: Intel Corp. 82820 820 (Camino 2) Chipset PCI (rev 04)
00:1f.0 ISA bridge: Intel Corp. 82820 820 (Camino 2) Chipset ISA Bridge (ICH2) (rev 04)
00:1f.1 IDE interface: Intel Corp. 82820 820 (Camino 2) Chipset IDE U100 (rev 04)
00:1f.2 USB Controller: Intel Corp. 82820 820 (Camino 2) Chipset USB (Hub A) (rev 04)
00:1f.3 SMBus: Intel Corp. 82820 820 (Camino 2) Chipset SMBus (rev 04)
00:1f.4 USB Controller: Intel Corp. 82820 820 (Camino 2) Chipset USB (Hub B) (rev 04)
00:1f.5 Multimedia audio controller: Intel Corp. 82820 820 (Camino 2) Chipset AC'97 Audio Controller (rev 04)
01:00.0 VGA compatible controller: Matrox Graphics, Inc.: Unknown device 0527 (rev 03)
02:04.0 USB Controller: NEC Corporation USB (rev 41)
02:04.1 USB Controller: NEC Corporation USB (rev 41)
02:04.2 USB Controller: NEC Corporation: Unknown device 00e0 (rev 02)
02:0b.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 78)
02:0c.0 FireWire (IEEE 1394): VIA Technologies, Inc. OHCI Compliant IEEE 1394 Host Controller (rev 46)
[vogl /net/home2/vogl]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping : 7
cpu MHz : 2405.505
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 4797.23
[vogl /net/home2/vogl]$ cat /proc/interrupts
CPU0
0: 23797604 IO-APIC-edge timer
1: 126073 IO-APIC-edge keyboard
2: 0 XT-PIC cascade
4: 2 IO-APIC-edge serial
8: 4 IO-APIC-edge rtc
14: 355574 IO-APIC-edge ide0
15: 5 IO-APIC-edge ide1
16: 0 IO-APIC-level Matrox Graphics, Inc. MGA Parhelia AGP
17: 3984 IO-APIC-level Intel 82801BA-ICH2
19: 18062239 IO-APIC-level usb-uhci
20: 169 IO-APIC-level ohci1394
22: 0 IO-APIC-level acpi
23: 10021486 IO-APIC-level eth0, ehci_hcd, usb-uhci
NMI: 0
LOC: 23796986
ERR: 0
MIS: 0


[root@wally /]# uptime
 10:02:55 up 23 days, 23:48,  4 users,  load average: 0.00, 0.00, 0.00
[root@wally /]# uname -a
Linux wally 2.4.23 #1 Tue Dec 23 09:44:22 CET 2003 i686 unknown
[root@wally /]# mount
/dev/ide/host0/bus0/target0/lun0/part1 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
devpts on /dev/pts type devpts (rw)
/dev/ide/host0/bus0/target0/lun0/part5 on /media type vfat (rw,noexec,nodev,gid=2000,umask=002)
usbdevfs on /proc/bus/usb type usbdevfs (rw)
automount(pid1324) on /data/net type autofs (rw,fd=5,pgrp=1324,minproto=2,maxproto=4)
automount(pid1348) on /net type autofs (rw,fd=5,pgrp=1348,minproto=2,maxproto=4)
automount(pid1319) on /var/autofs/misc type autofs (rw,fd=5,pgrp=1319,minproto=2,maxproto=4)


Anything I can do before formatting the disk to lighten things up? Simon


-- _______________________________________________________________________ Dr. Simon Vogl Institut fÃr Pervasive Computing, Johannes Kepler UniversitÃt Linz Altenberger StraÃe 69, A-4040 Linz, Austria

Tel: +43 732 2468-8517, Fax: +43 732 2468-8426
mailto: vogl@xxxxxxxxxxxxxxxxxxx,  http://www.soft.uni-linz.ac.at/







_______________________________________________

Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users

[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux