Re: unusual behavior of loop dev with backing file in tmpfs

Hugh Dickins <hughd@xxxxxxxxxx> · Tue, 11 Jan 2022 20:28:02 -0800 (PST)

On Fri, 26 Nov 2021, Lukas Czerner wrote:
> 
> I've noticed unusual test failure in e2fsprogs testsuite
> (m_assume_storage_prezeroed) where we use mke2fs to create a file system
> on loop device backed in file on tmpfs. For some reason sometimes the
> resulting file number of allocated blocks (stat -c '%b' /tmp/file) differs,
> but it really should not.
> 
> I was trying to create a simplified reproducer and noticed the following
> behavior on mainline kernel (v5.16-rc2-54-g5d9f4cf36721)
> 
> # truncate -s16M /tmp/file
> # stat -c '%b' /tmp/file
> 0
> 
> # losetup -f /tmp/file
> # stat -c '%b' /tmp/file
> 672
> 
> That alone is a little unexpected since the file is really supposed to
> be empty and when copied out of the tmpfs, it really is empty. But the
> following is even more weird.
> 
> We have a loop setup from above, so let's assume it's /dev/loop0. The
> following should be executed in quick succession, like in a script.
> 
> # dd if=/dev/zero of=/dev/loop0 bs=4k
> # blkdiscard -f /dev/loop0
> # stat -c '%b' /tmp/file
> 0
> # sleep 1
> # stat -c '%b' /tmp/file
> 672
> 
> Is that expected behavior ? From what I've seen when I use mkfs instead
> of this simplified example the number of blocks allocated as reported by
> stat can vary a quite a lot given more complex operations. The file itself
> does not seem to be corrupted in any way, so it is likely just an
> accounting problem.
> 
> Any idea what is going on there ?

I have half an answer; but maybe you worked it all out meanwhile anyway.

Yes, it happens like that for me too: 672 (but 216 on an old installation).

Half the answer is that funny code at the head of shmem_file_read_iter():
	/*
	 * Might this read be for a stacking filesystem?  Then when reading
	 * holes of a sparse file, we actually need to allocate those pages,
	 * and even mark them dirty, so it cannot exceed the max_blocks limit.
	 */
	if (!iter_is_iovec(to))
		sgp = SGP_CACHE;
which allocates pages to the tmpfs for reads from /dev/loop0; whereas
normally a read of a sparse tmpfs file would just give zeroes without
allocating.

[Do we still need that code? Mikulas asked 18 months ago, and I never
responded (sorry) because I failed to arrive at an informed answer.
It comes from a time while unionfs on tmpfs was actively developing,
and solved a real problem then; but by the time it went into tmpfs,
unionfs had already been persuaded to proceed differently, and no
longer needed it. I kept it in for indeterminate other stacking FSs,
but it's probably just culted cargo, doing more harm than good. I
suspect the best thing to do is, after the 5.17 merge window closes,
revive Mikulas's patch to delete it and see if anyone complains.]

But what is asynchronously reading /dev/loop0 (instantiating pages
initially, and reinstantiating them after blkdiscard)? I assume it's
some block device tracker, trying to read capacity and/or partition
table; whether from inside or outside the kernel, I expect you'll
guess much better than I can.

Hugh