Re: e2fsck: do not skip deeper checkers when s_last_orphan list has truncated inodes

zhanchengbin <zhanchengbin1@xxxxxxxxxx> · Fri, 18 Mar 2022 18:14:46 +0800

在 2022/3/16 1:54, Theodore Ts'o 写道:
On Tue, Mar 15, 2022 at 04:01:45PM +0800, zhanchengbin wrote:
If the system crashes when a file is being truncated, we will get a
problematic inode,
and it will be added into fs->super->s_last_orphan.
When we run `e2fsck -a img`, the s_last_orphan list will be traversed and
deleted.
During this period, orphan inodes in the s_last_orphan list with
i_links_count==0 can
be deleted, and orphan inodes with  i_links_count !=0 (ex. the truncated
inode)
cannot be deleted. However, when there are some orphan inodes with
i_links_count !=0,
the EXT2_VALID_FS is still assigned to fs->super->s_state, the deeper
checkers are skipped
with some inconsistency problems.

That's not supposed to happen.  We regularly put inodes on the orphan
list when they are being truncated so that if we crash, the truncation
operation can be completed as part of the journal recovery and remount
operation.  This is true regardles sof whether the recovery is done by
e2fsck or by the kernel.

Yes, you are right.
Truncated has been completed，and file ACL has been set to zero in
release_inode_blocks(), but the i_blocks was not subtracted acl blocks.
So i_blocks is inconsistent。
Li Jinlin sent a patch yesterday to fix it.

If a crash during a truncate leads to an inconsistent file system
after the file system is mounted, or after e2fsck does the journal
replay and orphan inode list processing, that's a kernel bug, and we
should fix the bug in the kernel.

Do you have a reliable reproducer for this situation?

I have a reproducer but it is not necessarily:
#!/bin/bash
disk_list=$(multipath -ll | grep filedisk | awk '{print $1}')

for disk in ${disk_list}
do
    mkfs.ext4 -F /dev/mapper/$disk
    mkdir ${disk}
done

function err_inject()
{
    iscsiadm -m node -p 127.0.0.1 -u &> /dev/null
    iscsiadm -m node -p 127.0.0.1 -l &> /dev/null
    sleep 1
    iscsiadm -m node -p 9.82.236.206 -u &> /dev/null
    iscsiadm -m node -p 9.82.236.206 -l &> /dev/null
    sleep 1

    iscsiadm -m node -p 127.0.0.1 -u &> /dev/null
    iscsiadm -m node -p 127.0.0.1 -l &> /dev/null
    iscsiadm -m node -p 9.82.236.206 -u &> /dev/null
    iscsiadm -m node -p 9.82.236.206 -l &> /dev/null
    sleep 1
}

count=0
while true
do
    ((count=count+1))
    for disk in ${disk_list}
    do
        while true
        do
            mount -o data_err=abort,errors=remount-ro /dev/mapper/$disk 
$disk && break
            sleep 0.1
        done
        nohup fsstress -d $(pwd)/$disk -l 10 -n 1000 -p 10 &>/dev/null &
    done

    sleep 5

    for disk in ${disk_list}
    do
        dm=$(multipath -ll | grep -w $disk | awk '{print $2}')
        aqu_sz=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print 
$(NF-1)}')
        util=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print 
$NF}')
        #if [ "${aqu_sz}" == "0.00" -o "$util" == "0.00" ];then
        #    iostat -x 1 -d 2
        #    exit 1
        #fi
        mount | grep $disk | grep '(ro' && exit 1
    done

    err_inject

    while [ -n "`pidof fsstress`" ]
    do
        sleep 1
    done

    for disk in ${disk_list}
    do
        umount $disk
        dm=$(multipath -ll | grep -w $disk | awk '{print $2}')
        aqu_sz=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print 
$(NF-1)}')
        util=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print 
$NF}')
        if [ "${aqu_sz}" != "0.00" -o "$util" != "0.00" ];then
            iostat -x 1 -d 2
            exit 1
        fi

        dd bs=1M if=/dev/mapper/$disk of=/root/dockerback

        fsck.ext4 -a /dev/mapper/$disk
            ret=$?
            if [ $ret -ne 0 -a $ret -ne 1 ]; then
                exit 1
            fi

        fsck.ext4 -fn /dev/mapper/$disk
            ret=$?
            if [ $ret -ne 0 ]; then
                exit 1
            fi
    done

    if [ $count -gt 5 ];then
        echo 3 > /proc/sys/vm/drop_caches
        sleep 1
        cat /proc/meminfo >> mem.txt
        echo "" >> mem.txt
        slabtop -o >> slab.txt
        echo "" >> slab.txt
        count=0
    fi
done

Thanks,

						- Ted
.