On Mon, 3 Apr 2023, Zi Yan wrote: > From: Zi Yan <ziy@xxxxxxxxxx> > > To split a THP to any lower order pages, we need to reform THPs on > subpages at given order and add page refcount based on the new page > order. Also we need to reinitialize page_deferred_list after removing > the page from the split_queue, otherwise a subsequent split will see > list corruption when checking the page_deferred_list again. > > It has many uses, like minimizing the number of pages after > truncating a huge pagecache page. For anonymous THPs, we can only split > them to order-0 like before until we add support for any size anonymous > THPs. > > Signed-off-by: Zi Yan <ziy@xxxxxxxxxx> > --- ... > @@ -2754,14 +2798,18 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) > if (folio_test_swapbacked(folio)) { > __lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, > -nr); > - } else { > + } else if (!new_order) { > + /* > + * Decrease THP stats only if split to normal > + * pages > + */ > __lruvec_stat_mod_folio(folio, NR_FILE_THPS, > -nr); > filemap_nr_thps_dec(mapping); > } > } This part is wrong. The problem I've had is /proc/sys/vm/stat_refresh warning of negative nr_shmem_hugepages (which then gets shown as 0 in vmstat or meminfo, even though there actually are shmem hugepages). At first I thought that the fix needed (which I'm running with) is: --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2797,17 +2797,16 @@ int split_huge_page_to_list_to_order(str int nr = folio_nr_pages(folio); xas_split(&xas, folio, folio_order(folio)); - if (folio_test_swapbacked(folio)) { - __lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, - -nr); - } else if (!new_order) { - /* - * Decrease THP stats only if split to normal - * pages - */ - __lruvec_stat_mod_folio(folio, NR_FILE_THPS, - -nr); - filemap_nr_thps_dec(mapping); + if (folio_test_pmd_mappable(folio) && + new_order < HPAGE_PMD_ORDER) { + if (folio_test_swapbacked(folio)) { + __lruvec_stat_mod_folio(folio, + NR_SHMEM_THPS, -nr); + } else { + __lruvec_stat_mod_folio(folio, + NR_FILE_THPS, -nr); + filemap_nr_thps_dec(mapping); + } } } because elsewhere the maintenance of NR_SHMEM_THPS or NR_FILE_THPS is rightly careful to be dependent on folio_test_pmd_mappable() (and, so far as I know, we shall not be seeing folios of order higher than HPAGE_PMD_ORDER yet in mm/huge_memory.c - those would need more thought). But it may be more complicated than that, given that patch 7/7 appears (I haven't tried) to allow splitting to other orders on a file opened for reading - that might be a bug. The complication here is that we now have four kinds of large folio in mm/huge_memory.c, and the rules are a bit different for each. Anonymous THPs: okay, I think I've seen you exclude those with -EINVAL at a higher level (and they wouldn't be getting into this "if (mapping) {" block anyway). Shmem (swapbacked) THPs: we are only allocating shmem in 0-order or HPAGE_PMD_ORDER at present. I can imagine that in a few months or a year-or-so's time, we shall want to follow Matthew's folio readahead, and generalize to other orders in shmem; but right now I'd really prefer not to have truncation or debugfs introducing the surprise of other orders there. Maybe there's little that needs to be fixed, only the THP_SWPOUT and THP_SWPOUT_FALLBACK statistics have come to mind so far (would need to be limited to folio_test_pmd_mappable()); though I've no idea how well intermediate orders will work with or against THP swapout. CONFIG_READ_ONLY_THP_FOR_FS=y file THPs: those need special care, and their filemap_nr_thps_dec(mapping) above may not be good enough. So long as it's working as intended, it does exclude the possibility of truncation splitting here; but if you allow splitting via debugfs to reach them, then the accounting needs to be changed - for them, any order higher than 0 has to be counted in nr_thps - so splitting one HPAGE_PMD_ORDER THP into multiple large folios will need to add to that count, not decrement it. Otherwise, a filesystem unprepared for large folios or compound pages is in danger of meeting them by surprise. Better just disable that possibility, along with shmem. mapping_large_folio_support() file THPs: this category is the one you're really trying to address with this series, they can already come in various orders, and it's fair for truncation to make a different choice of orders - but is what it's doing worth doing? I'll say more on 6/7. Hugh