On Fri, May 13, 2005 at 12:35:16AM +0200, Hans Yperman wrote: > This tragic history starts actually on windows: MS Word had wiped out > an important file on a floppy, and I got the task of retrieving what > was possible. Using Linux, I made an image with dd,and put it on the > now extinct EXT3 partition. I used an undelete programma , and then > mounted the image with a loopback device: > mount -o loop /tmp/image.img /floppy > As it turns out,the undeleter managed to screw up the FAT, and the > loopback device complains about reading past the end of the device. > After fixing the floppy on another computer, I come back to the linux > computer. The console is full of error messages. What version of the kernel are you using? What undelete program were you using? Most undelete programs don't require that you mount the filesystem; in fact, they often require that you *don't* mount them. > What happened? A first bug: Linux remounted the loopback-device > read-only because of the bad FAT on the image. BUT this did not work > out right: not only the loopback device, but the whole EXT3-partition > were now read-only. Every little write action results in an error, > hence all the messages. I did not really think much of it at that > point, and just did a > mount -o remount,rw / Without the logs, it sounds like the ext3 filesystem got corrupted, and so it was mounted remounted read-only. How this happened is not clear, and you didn't give us enough information to determine that; but it's consistent with e2fsck displaying errors. > At this point, I am already screwed, but I don't realize it yet: The > computer works completely normal from here on. The problem happens > the next time I boot: fsck complains about problems (weird, fsck is > not supposed to run for EXT3). When the kernel discovered a filesystem corruption, it marks the filesystem as containing errors, and remounts it read-only. When fsck will run, it will note the fact that filesystem has problems, and try to fix it. > Specifically, fsck complains about > double-allocated blocks, does a pass 1B and 1C (I'd never seen these > before either), dumps pages and pages and pages of block numbers, > get's very very veeeeryyy slow, and crashes. I restart fsck. This > time it starts asking me tons of yes/no questions because it wants to > know what to do with the double-allocated block. I yes them all > (There is no real right answer anyhow) and reboot. What version of e2fsck are you running? It must be an ancient one if got really slow like that. You wouldn't be running Debian Obsolete^H^H^H^H^H^H^H Stable, are you? > And that was it: init starts, and complains about not having an > /etc/inittab (and asks me which runlevel to start. Never seen that > before either). Then it crashes. Booting with knoppix reveals lots > and lost of damaged files. Everything that was cached seems to be > damaged, and some random files are also dead (my gues is ext3 screwed > up while updating atimes or something like that). Game over. The filesystem was probably screwed up much earlier than that. Probably something with the undelete program was run, or perhaps because you remounted the filesystem read-write after errors were uncovered, but it's going to be hard to reconstruct without a lot more details. (What specific messages were printed by the kernel describing the errors, exactly what version of the kernel, e2fsprogs, and undelete program you were using, etc.) I will say that while remounting a filesystem read/write after errors is dangerous, the fact that e2fsck displayed pages and pages of block numbers tends to indicate that that there was something more that went wrong. Merely remounting a filesystem read/write might result in a some multiply claimed blocks, which pass 1b/1c/1d are designed to resolve, but how many you have depends on how many files are written and how badly corrupted were the block allocation bitmaps. Assuming that you didn't run the system for very long before you rebooted, or didn't write a lot of files during this interim, it seems somewhat unlikely that it would have resulted in "pages and pages and pages" of block numbers. That would tend to argue that portions of the inode table got written to the wrong location, which is generally caused by a hardware error. It might have been caused by the undelete program, but that seems hard to believe. But then again, I don't know which undelete program you used, and it does seem very surprising that the undelete program would work with a mounted filesystem, so that part sounds like another user error (but not one that would be expected to cause major filesystem corruption). So the bottom line is I can't really tell you what could have happened with the limited facts that you've given me. > I guess these 2 facts need fixing: > 1) loopback devices should not pass errors over to their underlying filesystems. Loopback devices don't pass errors back over to their underlying filesystems. > 2) ext3 suicidally allows remounting read-write when parts of its data > are invalid. Linux will allow you to do many things that might be, well, ill-advised. When the kernel printed all of the warnings, it warned you that the filesystem had errors. Remounting it read/write was a really bad idea --- but then again, so is running the command "dd if=/dev/zero of=/dev/hda1" as root. > Other people might not like loosing a whole partition, so I mail this > sad story to you all. A bit of advice: if you ever see ext3 > complaining about being read-only, press the reset button. It might > save your partition. Or run e2fsck manually yourself; there are a number of things that you can do. Blindly remounting the filesystem read/write is certainly not one of them. Saving all of the error messages from the kernel describing the filesystem corruption is a really good idea. As is saving the messages from e2fsck, so people can figure out what happened after the fact. The one good thing is that you kept good backups, so you didn't lose that much; I definitely commend that. :-) - Ted _______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users