Re: Many orphaned inodes after resize2fs

Patrik Horník <patrik@xxxxxx> · Sat, 19 Apr 2014 17:42:12 +0200

OK,
so I patched it myself and confirmed that all errors were caused by this, because patched version does not warn about any inode and about anything. Thus it seems resize2fs did not harm the filesystem at all and it was all because of the false positive in e2fsck.

I patched it by considering inode part of suspected corrupted orphan list only if i_dtime is lower than specific constant around 1.1 bilion. This is some time before creation of my filesystem. You can find patch against 1.42.9 attached.

Please confirm that this is fully correct solution (for my purpose, not elegant clean way for official fix) and it has no negative consequences. It seems that way but I did not analyze all code paths the fixed code is in.

BTW were there any other negative consequences of this bug in e2fsck except changing i_dtime of inodes to current time?

Thanks.

Patrik

2014-04-19 1:20 GMT+02:00 Patrik Horník <patrik@xxxxxxxxx>:

Hi,

it seems you got it right! I don't know if you read email I sent you before posting to the mailing list, but I accidentally diagnosed the cause... :) I've noticed that inodes fsck warned me about, at least ones that I checked, all have all four timestamps latest in 2010...

The filesystem has maximum 1281998848 inodes, which is timestamp in august 2010. I don't know how it got that big, I think I did not specified big value initially. But I've resized it couple of times. BTW what is default of group size / inode count ratio? Mine ratio is not at the maximum you mentioned, but it is not that far.

So almost sure it is false positive by the code / bug in e2fsck/pass1.c around line 1070 in current version. I want to be sure that all these errors were caused by this, so can you please send me promptly patched version? I can easily patch it myself by some fixed condition, but I don't want miss something important... BTW maybe you can compare i_dtime with filesystem creation timestamp, so you dont have to put fixed number there.

BTW I dont know specifics of ext3, I just looked at sources of kernel driver and e2fsprogs now. But what indicates that inode is / was created and valid ? (I did not need it to find problematic test you mentioned, did not see it in part of code I look at and it is not apparent to me from definition of struct ext3_inode).

Thanks.

Patrik

2014-04-18 22:20 GMT+02:00  <tytso@xxxxxxx>:

On Fri, Apr 18, 2014 at 06:56:57PM +0200, Patrik Horník wrote:

>

> yesterday I experienced following problem with my ext3 filesystem:

>

> - I had ext3 filesystem of the size of a few TB with journal. I correctly

> unmounted it and it was marked clean.

>

> - I then ran fsck.etx3 -f on it and it did not find any problem.

>

> - After increasing size of its LVM volume by 1.5 TB I resized the

> filesystem by resize2fs lvm_volume and it finished without problem.

>

> - But fsck.ext3 -f immediately after that showed "Inodes that were part of

> a corrupted orphan linked list found." and many thousands of "Inode XXX was

> part of the orphaned inode list." I did not accepted fix. According to

> debugfs all the inodes I check from these reported orphaned inodes (I

> checked only some from beginning of list of errors) have size 0.

Can you send the output of dumpe2fs -h?  I'm curious how many inodes

you had after the resize, and what file system features might have

been enabled on your file system.

If the only file system corruption errors that you saw were from about

the corrupted orphan inode list, then things are probably OK.

What this error message means is that there are d_time values which

look like they belong to inode numbers (as opposed to number of

seconds since January 1, 1970).  So if you ran the system where the

clock was set incorrectly, so that the time was January 1, 1970, and

you delete a lot of files, you can run into this error --- it's

basically a sanity check that we put in a long time ago to catch

potential file system bugs caused by a corrupted orphan inode list.

I'm thinking that we should turn off this check if the e2fsck.conf

"broken_system_lock" is enabled, since if the system has a busted

system clock, this can end up triggering a bunch of scary warnings.

In any case, when you grew the size of the file system, this also

increased the number of inodes, which means it would increase the

sensitivity of hitting this bug.  It's also possible that if you

created your file system with the number of inodes per block group

close to the maximum (assuming an average file size 4k, which would be

highly wasteful of space, so it' s not the default), that you ended up

with the maximum number of inodes exceeding 1.2 or 1.3 billion inodes,

at which point it would trigger a false positive.  (And indeed, I

should probably put in a fix to e2fsprogs so that if a file system

does have more than 1.2 billion inodes, to disable this check.)

Cheers,

                                                - Ted

Attachment:
big-fs-fsck-fix.patch

Description: Binary data
_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users