On 2/1/2023 11:50 PM, Matthew Wilcox wrote: > On Wed, Feb 01, 2023 at 04:17:37PM +0800, Yin Fengwei wrote: >> Use do_set_pte_range() in filemap_map_folio_range(). Which >> batched updates mm counter, rmap. >> >> With a self cooked will-it-scale.page_fault3 like app (change >> file write fault to read fault) got 15% performance gain. > > I'd suggest that you create a will-it-scale.page_fault4. Anton > is quite open to adding new variations of tests. > >> Perf data collected before/after the change: >> 18.73%--page_add_file_rmap >> | >> --11.60%--__mod_lruvec_page_state >> | >> |--7.40%--__mod_memcg_lruvec_state >> | | >> | --5.58%--cgroup_rstat_updated >> | >> --2.53%--__mod_lruvec_state >> | >> --1.48%--__mod_node_page_state >> >> 9.93%--page_add_file_rmap_range >> | >> --2.67%--__mod_lruvec_page_state >> | >> |--1.95%--__mod_memcg_lruvec_state >> | | >> | --1.57%--cgroup_rstat_updated >> | >> --0.61%--__mod_lruvec_state >> | >> --0.54%--__mod_node_page_state >> >> The running time of __mode_lruvec_page_state() is reduced a lot. > > Nice. > >> +++ b/mm/filemap.c >> @@ -3364,11 +3364,22 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, >> struct file *file = vma->vm_file; >> struct page *page = folio_page(folio, start); >> unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); >> - unsigned int ref_count = 0, count = 0; >> + unsigned int ref_count = 0, count = 0, nr_mapped = 0; >> >> do { >> - if (PageHWPoison(page)) >> + if (PageHWPoison(page)) { >> + if (nr_mapped) { >> + vmf->pte -= nr_mapped; >> + do_set_pte_range(vmf, folio, >> + start + count - nr_mapped, >> + addr - nr_mapped * PAGE_SIZE, >> + nr_mapped); >> + >> + } >> + >> + nr_mapped = 0; >> continue; >> + } > > Having subtracted nr_mapped from vmf->pte, we then need to add it again. > > But this is all too complicated. What if we don't update vmf->pte > each time around the loop? ie something like this: > > do { > if (PageHWPoisoned(page)) > goto map; > if (mmap_miss > 0) > mmap_miss--; > /* > * If there're PTE markers, we'll leave them to be handled > * in the specific fault path, and it'll prohibit the > * fault-around logic. > */ > if (!pte_none(vmf->pte[count])) > goto map; > if (vmf->address == addr + count * PAGE_SIZE) > ret = VM_FAULT_NOPAGE; > continue; > map: > if (count > 1) { > do_set_pte_range(vmf, folio, start, addr, count - 1); > folio_ref_add(folio, count - 1); > } > start += count; > vmf->pte += count; > addr += count * PAGE_SIZE; > nr_pages -= count; > count = 0; I found it's not easy to make this part correct. So I tried to simplify the logic here a little bit. Regards Yin, Fengwei > } while (page++, ++count < nr_pages); > > if (count > 0) { > do_set_pte_range(vmf, folio, start, addr, count); > folio_ref_add(folio, count); > } else { > /* Make sure the PTE points to the correct page table */ > vmf->pte--; > }