I've been trying to track down a failure with e2fsck on a large filesystem. Running e2fsck-1.45.2 repeatedly and consistently reports an error in pass2: Internal error: couldn't find dir_info for <ino> The inode number is consistent across multiple runs, but examining it with debugfs shows that it is a normal directory with no problems. In order to complete the e2fsck, we tried using "debugfs -w -R 'clri <ino>'" to erase the inode, but this just shifted the error onto a slightly later directory (which would itself now be reported consistently). The filesystem is 6TB with 2.3B inodes, about 1.8B of which are in use. The inode numbers reporting problems are in the range of 1.2B. e2fsck is using about 45GB of RAM (with some swap on an NVMe SSD, so much faster than the HDD the filesystem image is on). It takes about 3-4h to hit this problem. I was able to run e2fsck under gdb, though this slows it down significantly (10-100x slower), if there are conditional breakpoints since they need to be evaluated for each inode, which makes it impossible to use this way. What I've discovered is quite interesting. The ctx->dir_info->array is filled in properly for the bad inode, and the array is OK at the start of pass2 (2 is the root inode, and $66 holds the bad inode number): (gdb) p e2fsck_get_dir_info(ctx, 2) $171 = (struct dir_info *) 0x7ffabf2bd010 (gdb) p *$171 $172 = {ino = 2, dotdot = 0, parent = 2} (gdb) p $66 $173 = 1222630490 (gdb) p e2fsck_get_dir_info(ctx, $66) $174 = (struct dir_info *) 0x7ffb3d327f54 (gdb) p *$174 $175 = {ino = 1222630490, dotdot = 0, parent = 0} The dir_info array itself is over 4GB in size (not sure if this is relevant or not), with 380M directories, but the bad inode is only about half way through the array ($140 is the index of the problematic entry): (gdb) p $140 $176 = 176197275 (gdb) p p ctx->dir_info->array[$140] $177 = {ino = 1222630490, dotdot = 0, parent = 0} (gdb) p *ctx->dir_info $178 = {count = 381539950, size = 381539965, array = 0x7ffabf2bd010, last_lookup = 0x0, tdb_fn = 0x0, tdb = 0x0} (gdb) p $178.count * sizeof(ctx->dir_info->array[0]) $179 = 4578479400 I put a hardware watchpoint on the ".parent" and ".dotdot" fields of this entry so that I could watch as e2fsck_dir_info_set_dotdot() updated it, but I didn't think to put one on the ".ino" field, since I thought it would be static. (gdb) info break Num Type Disp Enb Address What 15 hw watchpoint keep y -location $177->parent 16 hw watchpoint keep y -location $177->dotdot The watchpoint triggered, and saw that the entry was changed by qsort_r(), which at first I thought "OK, the dir_info array needs to be sorted, because a binary search is used on it", but in fact the array is *created* in order during the inode table scan and does not need to be sorted. As can be seen from the stack, it is *another* array that is being sorted that overwrites the dir_info entry: Hardware watchpoint 16: -location $177->dotdot Old value = 0 New value = 1581603653 0x0000003b2ea381f7 in _quicksort () from /lib64/libc.so.6 (gdb) p $141 $179 = {ino = 1222630490, dotdot = 0, parent = 0} (gdb) bt #0 0x0000003b2ea381e1 in _quicksort () from /lib64/libc.so.6 #1 0x0000003b2ea38f82 in qsort_r () from /lib64/libc.so.6 #2 0x000000000045447b in ext2fs_dblist_sort2 ( dblist=0x6c9a90, sortfunc=0x428c3d <special_dir_block_cmp>) at dblist.c:189 #3 0x00000000004285bc in e2fsck_pass2 ( ctx=0x6c4000) at pass2.c:180 #4 0x000000000041646b in e2fsck_run (ctx=0x6c4000) at e2fsck.c:253 #5 0x0000000000415210 in main (argc=4, argv=0x7fffffffe088) at unix.c:2047 (gdb) p ctx->dir_info->array[$140] $207 = {ino = 0, dotdot = 1581603653, parent = 0} (gdb) f 3 #3 0x00000000004285bc in e2fsck_pass2 ( ctx=0x6c4000) at pass2.c:180 180 ext2fs_dblist_sort2(fs->dblist, special_dir_block_cmp); AFAIK, the ext2fs_dblist_sort2() is for the directory *blocks*, and should not be changing the dir_info at all. Is this a bug in qsort or glibc? What I just noticed writing this email is that the fs->dblist.list address is right in the middle of the dir_info array address range: (gdb) p *fs->dblist $210 = {magic = 2133571340, fs = 0x6c4460, size = 763079922, count = 388821313, sorted = 0, list = 0x7ffad011e010} (gdb) p &ctx->dir_info->array[0] $211 = (struct dir_info *) 0x7ffabf2bd010 (gdb) p &ctx->dir_info->array[$140] $212 = (struct dir_info *) 0x7ffb3d327f54 which might explain why sorting dblist is messing with dir_info? I don't _think_ it is a problem with my build or swap, which is different from the system that this was originally reproduced on. Any thoughts on how to continue debugging this? Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP