Re: Splitting a THP beyond EOF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 19 Oct 2020, at 21:43, Matthew Wilcox wrote:

This is a weird one ... which is good because it means the obvious
ones have been fixed and now I'm just tripping over the weird cases.
And fortunately, xfstests exercises the weird cases.

1. The file is 0x3d000 bytes long.
2. A readahead allocates an order-2 THP for 0x3c000-0x3ffff
3. We simulate a read error for 0x3c000-0x3cfff
4. Userspace writes to 0x3d697 to 0x3dfaa
5. iomap_write_begin() gets the 0x3c page, sees it's THP and !Uptodate
   so it calls iomap_split_page() (passing page 0x3d)
6. iomap_split_page() calls split_huge_page()
7. split_huge_page() sees that page 0x3d is beyond EOF, so it removes it
   from i_pages
8. iomap_write_actor() copies the data into page 0x3d

I’m guessing that iomap_write_begin() is still in charge of locking the pages, and that iomap_split_page()->split_huge_page() is just reusing that lock?

It sounds like you’re missing a flag to iomap_split_page() that says: I care about range A->B, even if its beyond EOF. IOW, iomap_write_begin()’s path should be in charge of doing the right thing for the write, without relying on the rest of the kernel to avoid upsetting it.

9. The write is lost.

Trying to persuade XFS to update i_size before calling
iomap_file_buffered_write() seems like a bad idea.

Changing split_huge_page() to disregard i_size() is something I kind
of want to be able to do long-term in order to make hole-punch more
efficient, but that seems like a lot of work right now.


The problem with trusting i_size is that it changes at surprising times. For this code inside split_huge_page(), end == i_size_read()

        for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
                __split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
                if (head[i].index >= end) {
                        ClearPageDirty(head + i);

But, we actually change i_size after dropping all the page locks. In xfs this is xfs_setattr_size()->truncate_setsize(), all of which means that dropping PageDirty seems unwise if this code is running concurrently with an expanding truncate. If i_size jumps past the page where you’re clearing dirty, it probably won’t be good. Ignore me if this is already handled differently, it just seems error prone in current Linus.

-chris



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux