Re: Ext4 corruption with VM images as 3 > drop_caches

Ritesh Harjani <riteshh@xxxxxxxxxxxxx> · Sat, 21 Mar 2020 08:52:40 +0530

On 3/20/20 5:19 PM, Jan Kara wrote:
On Fri 20-03-20 11:04:50, Ritesh Harjani wrote:
On 3/19/20 6:54 PM, Ritesh Harjani wrote:
On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote:
Hi,

With new vm install I am finding corruption with the vm image if I
follow up the install with echo 3 > /proc/sys/vm/drop_caches

The file system reports below error.

Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ...
[    4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode
#787185: comm sh: iget: checksum invalid
done.
[    5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode
#917954: comm init: iget: checksum invalid
[    5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode
#917954: comm init: iget: checksum invalid
/sbin/init: error while loading shared libraries: libc.so.6: cannot
open shared object file: Error 74
[    5.271207] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00007f00

And debugfs reports

debugfs:  stat <917954>
Inode: 917954   Type: bad type    Mode:  0000   Flags: 0x0
Generation: 0    Version: 0x00000000
User:     0   Group:     0   Size: 0
File ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969
atime: 0x00000000 -- Wed Dec 31 18:00:00 1969
mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969
Size of extra inode fields: 0
Inode checksum: 0x00000000
BLOCKS:
debugfs:

Bisecting this finds
Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make
dioread_nolock the default")
as bad. If I revert the same on top of linus
upstream(fb33c6510d5595144d585aa194d377cf74d31911)
I don't hit the corrupttion anymore.

Tried replicating this and could easily replicate it on Power box.
I tried to reproduce this on x86 too, but could not reproduce on x86.
Now one difference on Power could be that pagesize is 64K and fs
blocksize is 4K.

The issue looks like the guest qemu image file is not properly written
back, after host does echo 3 > drop_caches. (correct me if this is not
the case).

Ok. So tried this issue with passing "cache=directsync" parameter to
drive file. This parameter says it should bypass the host side page
cache. With this parameter, I don't see this issue on Power box.

OK, so this likely means that there is something hosed in the writeback
path using unwritten extents when blocksize < pagesize. Maybe we miss some
conversion of unwritten extent to a written one and thus after dropping
caches we effectively loose data?

Yes, that seems like it. I will try and create a small test case
considering this. Also will go over the unwritten to written path and
check what did I miss there.

Thanks
ritesh

I tried replicating via below test, but it could not reproduce.

Any idea what kind of unit test could be written for this?
I am not sure how exactly qemu is writing to it's image file.

1. Create 2 files. "mmap-file", "mmap-data".
2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried
with both 64KB align and 4KB align offsets), try to write
pagesize/blocksize amount of known data pattern.
3. These offsets (which are pagesize/blocksize align) are recorded into
"mmap-data" file via normal read/write calls.
4. Then after we wrote to both files, we munmap the "mmap-file" and
close both of these files.
5. Then we do echo 3 > drop_caches.
6. Then in the verify phase, using the offsets written in "mmap-data"
file, I read the "mmap-file" to verify if it's contents are proper or
not.
With that could not reproduce this issue.

-ritesh