On Thu, Feb 12, 2009 at 10:54:40AM +0100, Vegard Svanberg wrote: > After a power failure, a ~500G filesystem crashed. Fsck has been running > for days. The problem seems to be multiply-claimed blocks. Example: > > File /directory/file.name/foo (inode #1234567, mod time Tue Feb > 10 08:14:40 2008) > has 1800000 multiply-claimed block(s), shared with 1 file(s): > > /directory/file.name/bar > (inode #1234567, mod time Wed Dec 1 15:30:00 2008) > Clone multiply-claimed blocks? y > > This takes like forever, probably due to the large number of > multiply-claimed blocks. You are using a version of e2fsprogs/e2fsck newer than 1.28, right? If not, there's your problem; upgrade to something newer. Older e2fsck's had O(n**2) algorithms that made this very slow, causing this pass to be CPU-bound. It could be slow because of memory pressure issues; the data structures for keeping track of all of those blocks aren't small. >I was wondering if: > > - I can get a list of the impacted files/inodes Yes; you can; they were listed by e2fsck during pass 1B, actually: Look for entries like this: Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 12: 25 26 Multiply-claimed block(s) in inode 13: 25 26 57 58 Multiply-claimed block(s) in inode 14: 57 58 > - Wipe them with debugfs You could wipe them all out via debugfs's clri function, like this: debugfs -R "clri <12> <13> <14>" /dev/sdXX The angle brackets indicate that you are passing in an inode number, instead of a pathname; and I've left it as an exercise to the reader how to use your choice of tools (emacs, grep/awk, perl) to pull out the necessary inode numbers from e2fsck's Pass1B output. Then run e2fsck, and it will clear the resulting inodes. To get the filenames, do this first, before the clri command: debugfs -R "ncheck 12 13 14" /dev/sdXX (No angle brackets are needed because ncheck only takes inode numbers and converts them to pathnames.) > Is this safe? How do I do it? Fsck says it's 538 inodes with this > problem. If I could get a file list and be able to wipe the inodes, I > could restore the missing files from backup and get the machine online > again quickly. However, it's not strictly necessary to wipe all 538 inodes. It's likely that you only need to wipe approximately half of them. What happened is that somehow, the disk drive got confused and wrote data to the wrong location on disk. Or, the journal was corrupted (one of the reasons why ext4 has journal checksums) so inode table blocks got written to the wrong place on disk. So that means what you'll see is something like this: Multiply-claimed block(s) in inode 32: 200 201 203 Multiply-claimed block(s) in inode 33: 210 211 212 213 214 Multiply-claimed block(s) in inode 34: 215 216 217 218 ... Multiply-claimed block(s) in inode 128: 200 201 203 Multiply-claimed block(s) in inode 129: 210 211 212 213 214 Multiply-claimed block(s) in inode 130: 215 216 217 218 You may not see 16 or 32 inodes in each group of duplicate inodes (there are 32 inodes in each 4k block, 16 inodes per 4k block if you are using 256 byte inodes), since some inodes may have been deleted or never allocated before. In any case, only one set of inodes will be correct; after you determine which one set seems correct given the mapping between pathnames and file contents, you can clri the other set. Or if that's too much effort, you can clri them all and recover them from backups.... - Ted _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users