On Tue, Jan 17, 2023 at 11:37:05AM +0100, David Hildenbrand wrote: > On 17.01.23 10:19, Yin, Fengwei wrote: > > > > > > On 1/14/2023 2:13 AM, Matthew Wilcox wrote: > > > On Sat, Jan 14, 2023 at 12:35:38AM +0800, Yin Fengwei wrote: > > > > The page fault number can be reduced by batched PTEs population. > > > > The batch size of PTEs population is not allowed to cross: > > > > - page table boundaries > > > > - vma range > > > > - large folio size > > > > - fault_around_bytes > > > > > > I find this patch very interesting. But is it really worth it? Most > > > file-backed page faults are resolved through the ->map_pages() path > > > which is almost always filemap_map_pages(), which does something > > > fairly similar to this already. Do you have any performance numbers? > > > > > I tried the will-it-scale page_fault3: > > https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault3.c > > with 96 processes on a test box with 48C/86T. > > > > The test result got about 3.75X better with 4.1X less page fault number > > with this patch. > > > > But It's a micro benchmark which shows extreme friendly case to this patch. > > > > I didn't see observed performance gain with other workloads. I suppose > > shared file write operations may not be common operations? Thanks. > > One question I have after reading "which does something fairly similar to > this already", if both paths could be unified. I've been thinking about this already; not so much in terms of "unifying these two implementations" but rather "What's the right API for mapping the parts of a folio that fit into userspace in response to a fault". I haven't got quite as far as drafting code, but I'm thinking there should be an API where we pass in the vmf, a folio and some other information, and that function takes care of mapping however many pages from this folio that it can. And the reason to split it up like that is to batch as many page operations into this folio as possible. eg filemap_map_pages() was written "to get it working", not "to be efficient", so it does stupid things like call folio_ref_inc() every time it maps a page instead of counting how many pages it maps and calling folio_ref_add() at the end. Similar optimisations should be done for mapcount, which implies some kind of batched equivalent of page_add_*_rmap().