在 2022/3/16 1:54, Theodore Ts'o 写道:
On Tue, Mar 15, 2022 at 04:01:45PM +0800, zhanchengbin wrote:
If the system crashes when a file is being truncated, we will get a
problematic inode,
and it will be added into fs->super->s_last_orphan.
When we run `e2fsck -a img`, the s_last_orphan list will be traversed and
deleted.
During this period, orphan inodes in the s_last_orphan list with
i_links_count==0 can
be deleted, and orphan inodes with i_links_count !=0 (ex. the truncated
inode)
cannot be deleted. However, when there are some orphan inodes with
i_links_count !=0,
the EXT2_VALID_FS is still assigned to fs->super->s_state, the deeper
checkers are skipped
with some inconsistency problems.
That's not supposed to happen. We regularly put inodes on the orphan
list when they are being truncated so that if we crash, the truncation
operation can be completed as part of the journal recovery and remount
operation. This is true regardles sof whether the recovery is done by
e2fsck or by the kernel.
Yes, you are right.
Truncated has been completed,and file ACL has been set to zero in
release_inode_blocks(), but the i_blocks was not subtracted acl blocks.
So i_blocks is inconsistent。
Li Jinlin sent a patch yesterday to fix it.
If a crash during a truncate leads to an inconsistent file system
after the file system is mounted, or after e2fsck does the journal
replay and orphan inode list processing, that's a kernel bug, and we
should fix the bug in the kernel.
Do you have a reliable reproducer for this situation?
I have a reproducer but it is not necessarily:
#!/bin/bash
disk_list=$(multipath -ll | grep filedisk | awk '{print $1}')
for disk in ${disk_list}
do
mkfs.ext4 -F /dev/mapper/$disk
mkdir ${disk}
done
function err_inject()
{
iscsiadm -m node -p 127.0.0.1 -u &> /dev/null
iscsiadm -m node -p 127.0.0.1 -l &> /dev/null
sleep 1
iscsiadm -m node -p 9.82.236.206 -u &> /dev/null
iscsiadm -m node -p 9.82.236.206 -l &> /dev/null
sleep 1
iscsiadm -m node -p 127.0.0.1 -u &> /dev/null
iscsiadm -m node -p 127.0.0.1 -l &> /dev/null
iscsiadm -m node -p 9.82.236.206 -u &> /dev/null
iscsiadm -m node -p 9.82.236.206 -l &> /dev/null
sleep 1
}
count=0
while true
do
((count=count+1))
for disk in ${disk_list}
do
while true
do
mount -o data_err=abort,errors=remount-ro /dev/mapper/$disk
$disk && break
sleep 0.1
done
nohup fsstress -d $(pwd)/$disk -l 10 -n 1000 -p 10 &>/dev/null &
done
sleep 5
for disk in ${disk_list}
do
dm=$(multipath -ll | grep -w $disk | awk '{print $2}')
aqu_sz=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print
$(NF-1)}')
util=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print
$NF}')
#if [ "${aqu_sz}" == "0.00" -o "$util" == "0.00" ];then
# iostat -x 1 -d 2
# exit 1
#fi
mount | grep $disk | grep '(ro' && exit 1
done
err_inject
while [ -n "`pidof fsstress`" ]
do
sleep 1
done
for disk in ${disk_list}
do
umount $disk
dm=$(multipath -ll | grep -w $disk | awk '{print $2}')
aqu_sz=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print
$(NF-1)}')
util=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print
$NF}')
if [ "${aqu_sz}" != "0.00" -o "$util" != "0.00" ];then
iostat -x 1 -d 2
exit 1
fi
dd bs=1M if=/dev/mapper/$disk of=/root/dockerback
fsck.ext4 -a /dev/mapper/$disk
ret=$?
if [ $ret -ne 0 -a $ret -ne 1 ]; then
exit 1
fi
fsck.ext4 -fn /dev/mapper/$disk
ret=$?
if [ $ret -ne 0 ]; then
exit 1
fi
done
if [ $count -gt 5 ];then
echo 3 > /proc/sys/vm/drop_caches
sleep 1
cat /proc/meminfo >> mem.txt
echo "" >> mem.txt
slabtop -o >> slab.txt
echo "" >> slab.txt
count=0
fi
done
Thanks,
- Ted
.