On 1/17/2023 10:46 PM, Matthew Wilcox wrote: > On Tue, Jan 17, 2023 at 11:37:05AM +0100, David Hildenbrand wrote: >> On 17.01.23 10:19, Yin, Fengwei wrote: >>> >>> >>> On 1/14/2023 2:13 AM, Matthew Wilcox wrote: >>>> On Sat, Jan 14, 2023 at 12:35:38AM +0800, Yin Fengwei wrote: >>>>> The page fault number can be reduced by batched PTEs population. >>>>> The batch size of PTEs population is not allowed to cross: >>>>> - page table boundaries >>>>> - vma range >>>>> - large folio size >>>>> - fault_around_bytes >>>> >>>> I find this patch very interesting. But is it really worth it? Most >>>> file-backed page faults are resolved through the ->map_pages() path >>>> which is almost always filemap_map_pages(), which does something >>>> fairly similar to this already. Do you have any performance numbers? >>>> >>> I tried the will-it-scale page_fault3: >>> https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault3.c >>> with 96 processes on a test box with 48C/86T. >>> >>> The test result got about 3.75X better with 4.1X less page fault number >>> with this patch. >>> >>> But It's a micro benchmark which shows extreme friendly case to this patch. >>> >>> I didn't see observed performance gain with other workloads. I suppose >>> shared file write operations may not be common operations? Thanks. >> >> One question I have after reading "which does something fairly similar to >> this already", if both paths could be unified. > > I've been thinking about this already; not so much in terms of "unifying > these two implementations" but rather "What's the right API for mapping > the parts of a folio that fit into userspace in response to a fault". > I haven't got quite as far as drafting code, but I'm thinking there should > be an API where we pass in the vmf, a folio and some other information, > and that function takes care of mapping however many pages from this > folio that it can. That would be lovely. > > And the reason to split it up like that is to batch as many page > operations into this folio as possible. eg filemap_map_pages() was > written "to get it working", not "to be efficient", so it does stupid > things like call folio_ref_inc() every time it maps a page instead of > counting how many pages it maps and calling folio_ref_add() at the end. > Similar optimisations should be done for mapcount, which implies some > kind of batched equivalent of page_add_*_rmap(). I tried to do batched mapcount when I worked on this patch. But page_add_file_rmap() just support compound PMD_SIZE folio (maybe it is time to remove that limitation?). Regards Yin, Fengwei