Re: Splitting a THP beyond EOF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 20, 2020 at 02:43:57AM +0100, Matthew Wilcox wrote:
> This is a weird one ... which is good because it means the obvious
> ones have been fixed and now I'm just tripping over the weird cases.
> And fortunately, xfstests exercises the weird cases.
> 
> 1. The file is 0x3d000 bytes long.
> 2. A readahead allocates an order-2 THP for 0x3c000-0x3ffff
> 3. We simulate a read error for 0x3c000-0x3cfff
> 4. Userspace writes to 0x3d697 to 0x3dfaa

So this is a write() beyond EOF, yes?

If yes, then we first go through this path:

	xfs_file_buffered_aio_write()
	  xfs_file_aio_write_checks()
	    iomap_zero_range(isize, pos - isize)

To zero the region between the current EOF and where the new write
starts. i.e. from 0x3d000 to 0x3d696.

> 5. iomap_write_begin() gets the 0x3c page, sees it's THP and !Uptodate
>    so it calls iomap_split_page() (passing page 0x3d)

Splitting the page because it's !Uptodate seems rather drastic to
me.  Why does it need to split the page here?

Also, this concerns me: if we are exposing the cached EOF page via
mmap, it needs to contain only zeroes in the region beyond EOF so
that we don't expose stale data to userspace. Hence when a THP that
contains EOF is instantiated, we have to ensure that the region
beyond EOF is compeltely zeroed. It then follows that if we read all
the data in that THP up to EOF, then the page is actually up to
date...

And hence, I don't see why we'd need to split it just because we
got a small, unaligned write beyond EOF....

> 6. iomap_split_page() calls split_huge_page()
> 7. split_huge_page() sees that page 0x3d is beyond EOF, so it removes it
>    from i_pages
> 8. iomap_write_actor() copies the data into page 0x3d
> 9. The write is lost.
> 
> Trying to persuade XFS to update i_size before calling
> iomap_file_buffered_write() seems like a bad idea.

Agreed, I can't see anything good coming from doing that...

> Changing split_huge_page() to disregard i_size() is something I kind
> of want to be able to do long-term in order to make hole-punch more
> efficient, but that seems like a lot of work right now.
> 
> I think the easiest way to fix this is to decline to allocate readahead
> pages beyond EOF.  That is, if we have a file which is, say, 61 pages
> long, read the last 5 pages into an order-2 THP and an order-0 THP
> instead of allocating an order-3 THP and zeroing the last three pages.

I think you need to consider zeroing the THP range beyond EOF when
doing readahead....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux