On 16 Apr 2023, at 15:44, Hugh Dickins wrote: > On Mon, 3 Apr 2023, Zi Yan wrote: > >> From: Zi Yan <ziy@xxxxxxxxxx> >> >> To minimize the number of pages after a huge page truncation, we do not >> need to split it all the way down to order-0. The huge page has at most >> three parts, the part before offset, the part to be truncated, the part >> remaining at the end. Find the greatest common divisor of them to >> calculate the new page order from it, so we can split the huge >> page to this order and keep the remaining pages as large and as few as >> possible. >> >> Signed-off-by: Zi Yan <ziy@xxxxxxxxxx> >> --- >> mm/truncate.c | 21 +++++++++++++++++++-- >> 1 file changed, 19 insertions(+), 2 deletions(-) >> >> diff --git a/mm/truncate.c b/mm/truncate.c >> index 86de31ed4d32..817efd5e94b4 100644 >> --- a/mm/truncate.c >> +++ b/mm/truncate.c >> @@ -22,6 +22,7 @@ >> #include <linux/buffer_head.h> /* grr. try_to_release_page */ >> #include <linux/shmem_fs.h> >> #include <linux/rmap.h> >> +#include <linux/gcd.h> > > Really? > >> #include "internal.h" >> >> /* >> @@ -211,7 +212,8 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) >> bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) >> { >> loff_t pos = folio_pos(folio); >> - unsigned int offset, length; >> + unsigned int offset, length, remaining; >> + unsigned int new_order = folio_order(folio); >> >> if (pos < start) >> offset = start - pos; >> @@ -222,6 +224,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) >> length = length - offset; >> else >> length = end + 1 - pos - offset; >> + remaining = folio_size(folio) - offset - length; >> >> folio_wait_writeback(folio); >> if (length == folio_size(folio)) { >> @@ -236,11 +239,25 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) >> */ >> folio_zero_range(folio, offset, length); >> >> + /* >> + * Use the greatest common divisor of offset, length, and remaining >> + * as the smallest page size and compute the new order from it. So we >> + * can truncate a subpage as large as possible. Round up gcd to >> + * PAGE_SIZE, otherwise ilog2 can give -1 when gcd/PAGE_SIZE is 0. >> + */ >> + new_order = ilog2(round_up(gcd(gcd(offset, length), remaining), >> + PAGE_SIZE) / PAGE_SIZE); > > Gosh. In mm/readahead.c I can see "order = __ffs(index)", > and I think something along those lines would be more appropriate here. > > But, if there's any value at all to choosing intermediate orders here in > truncation, I don't think choosing a single order is the right approach - > more easily implemented, yes, but is it worth doing? > > What you'd actually want (if anything) is to choose the largest orders > possible, with smaller and smaller orders filling in the rest (I expect > there's a technical name for this, but I don't remember - bin packing > is something else, I think). > > As this code stands, truncate a 2M huge page at 1M and you get two 1M > pieces (one then discarded) - nice; but truncate it at 1M+1 and you get > lots of order 2 (forced up from 1) pieces. Seems weird, and not worth > the effort. The approach I am adding here is the simplest way of splitting a folio and trying to get >0 order folios after the split. Yes, I agree that using "__ffs(index)" can create more >0 order folios, but it comes with either more runtime overheads or more code changes. Like your "1MB + 1" page split example, using "__ffs(index)", ideally, you will split 2MB into 2 1MBs, then 1MB into 2 512KBs, ..., 8KB into 2 4KBs and at the end of the day, we will have 1MB, 512KB, ..., 8KB each and two 2 4KBs, maximizing the number of >0 order folios. But what is the cost? 1. To minimizing code changes, we then need to call split_huge_page_to_list_to_order() 9 times from new_order=8 to new_order=0. Since each split needs to unmap and remap the target folio, we shall see 9 TLB shutdowns. I am not sure it is worth the cost. 2. To get rid of the unmap and remap overheads, we probably can unmap the folio, then do all the 9 splits, then remap all the split pages. But this would make split_huge_page() a lot more complicated and I am not sure a good way of passing split order information and the corresponding to-be-split subpages. Do we need a dynamic list to store them, making new memory allocations a prerequisite of split_huge_page()? Maybe we can encode in the split information in two ints? In the first one, each bit tells which order to split the page (like order=__ffs(index)) and in the second one, each bit tells which subpage to split next (0 means the left subpage, 1 means the right subpage). So your "1MB + 1" page split will be encoded as 0b111111111 (first int), 0b1000000 (second int and it has 1 fewer bit, since first split does not need to know which subpage to split). What do you think? If you have a better idea, I am all ears. And if you are willing to help me review the more complicated code changes, I am more than happy to implement it in the next version. :) Thank you for your comments. They are very helpful! > > Hugh > >> + >> + /* order-1 THP not supported, downgrade to order-0 */ >> + if (new_order == 1) >> + new_order = 0; >> + >> + >> if (folio_has_private(folio)) >> folio_invalidate(folio, offset, length); >> if (!folio_test_large(folio)) >> return true; >> - if (split_folio(folio) == 0) >> + if (split_huge_page_to_list_to_order(&folio->page, NULL, new_order) == 0) >> return true; >> if (folio_test_dirty(folio)) >> return false; >> -- >> 2.39.2 -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature