On 11/24/2023 12:05 AM, Yin, Fengwei wrote: > > On 11/20/2023 9:48 PM, kernel test robot wrote: >> >> hi, Fengwei, >> >> we noticed c8be038067 is the fix commit for >> de74976eb65151a2f568e477fc2e0032df5b22b4 ("filemap: add filemap_map_folio_range()") >> >> and we captured numbers of improvement for this commit >> (refer to below >> "In addition to that, the commit also has significant impact on the following tests" >> part which includes several examples) >> >> however, recently, we found a regression as title mentioned. > I can reproduce the regression on an Ice Lake platform with 256G memory + 72C/144T. > >> >> the extra information we want to share is we also tested on mainline tip when >> this bisect done, and noticed the regression disappear: > I can also reproduce this "regression disappear on latest mainline". And I found > the related commit was: > > commit f5617ffeb450f84c57f7eba1a3524a29955d42b7 > Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > Date: Mon Jul 24 19:54:08 2023 +0100 > > mm: run the fault-around code under the VMA lock > > With this commit, the map_pages() could be called twice. The first is with VMA lock hold. > The second one is with mmap_lock (even no set_pte because of !pte_none()). > > With this commit reverted, the regression can be restored to some level. > > > And The reason that the "regression disappear on latest mainline" is related with > > commit 12214eba1992642eee5813a9cc9f626e5b2d1815 (test6) > Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > Date: Fri Oct 6 20:53:17 2023 +0100 > > mm: handle read faults under the VMA lock > > > This commit eliminates the second call of map_pages() and the regression can be > restored to some level. > > We may still need to move following code block before fault around: > ret = vmf_can_call_fault(vmf); > if (ret) > return ret; > >> >> # extra tests on head commit of linus/master >> # good: [9bacdd8996c77c42ca004440be610692275ff9d0] Merge tag 'for-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux >> >> the data is even better than parent: >> >> "vm-scalability.throughput": [ >> 54122867, >> 58465096, >> 55888729, >> 56960948, >> 56385702, >> 56392203 >> ], >> >> and we also reverted c8be038067 on maineline tip, but found no big diff: >> >> # extra tests on revert first bad commit >> # good: [f13a82be4c3252dabd1328a437c309d84ec7a872] Revert "filemap: add filemap_map_order0_folio() to handle order0 folio" >> >> "vm-scalability.throughput": [ >> 56434337, >> 56199754, >> 56214041, >> 55308070, >> 55401115, >> 55709753 >> ], >> >> >> commit: >> 578d7699e5 ("proc: nommu: /proc/<pid>/maps: release mmap read lock") >> c8be038067 ("filemap: add filemap_map_order0_folio() to handle order0 folio") >> >> 578d7699e5c2add8 c8be03806738c86521dbf1e0503 >> ---------------- --------------------------- >> %stddev %change %stddev >> \ | \ >> 146.95 ± 8% -83.0% 24.99 ± 3% vm-scalability.free_time >> 233050 -28.1% 167484 vm-scalability.median >> 590.30 ± 12% -590.2 0.06 ± 45% vm-scalability.stddev% >> 51589606 -27.3% 37516397 vm-scalability.throughput > > I found very interesting behavior: > 1. I am sure the filemap_map_order0_folio() is faster than filemap_map_folio_range() > if the folio has order 0 (true for shmem for now). > > 2. If I use tool ebpf_trace to get how many times the filemap_map_order0_folio() is > called during vm-scalability is running, the test result become better. In general, > the test result should become worse. > > It looks slower filemap_map_order0_folio() can make better vm-scalability result. > I did another testing by adding 2us delay in filemap_map_order0_folio(), the > vm-scalability result get a little bit improved. > > 3. using perf with 578d7699e5 and c8be038067: > for do_read_fault() with 578d7699e5 : > - 48.58% 0.04% usemem [kernel.vmlinux] [k] do_read_fault > - 48.54% do_read_fault > - 44.34% filemap_map_pages > 19.45% filemap_map_folio_range > 3.22% next_uptodate_folio > + 1.72% folio_wake_bit > + 3.29% __do_fault > 0.65% folio_wake_bit > > for do_read_fault() with c8be038067: > - 72.98% 0.09% usemem [kernel.vmlinux] [k] do_read_fault > - 72.89% do_read_fault > - 52.70% filemap_map_pages > 32.10% next_uptodate_folio <----- much higher than 578d7699e5 > + 12.35% folio_wake_bit > + 1.53% set_pte_range > + 12.43% __do_fault > + 6.36% folio_wake_bit <----- higher than 578d7699e5 > + 0.97% finish_fault > > I have theory that faster filemap_map_order0_folio() brings more contention in next_uptodate_folio(). > > 4. I finally located what part of code brought higher contentions in next_uptodate_folio(). It's related with > following change: > diff --git a/mm/filemap.c b/mm/filemap.c > index 582f5317ff71..056a2d2e2428 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3481,7 +3481,6 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, > struct vm_area_struct *vma = vmf->vma; > struct file *file = vma->vm_file; > struct page *page = folio_page(folio, start); > - unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); > unsigned int count = 0; > pte_t *old_ptep = vmf->pte; > > @@ -3489,9 +3488,6 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, > if (PageHWPoison(page + count)) > goto skip; > > - if (mmap_miss > 0) > - mmap_miss--; > - > /* > * NOTE: If there're PTE markers, we'll leave them to be > * handled in the specific fault path, and it'll prohibit the > @@ -3525,7 +3521,6 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, > } > > vmf->pte = old_ptep; > - WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); > > return ret; > } > > If apply above change to 578d7699e5, the next_uptodate_folio() raised. perf command got: > - 68.83% 0.08% usemem [kernel.vmlinux] [k] do_read_fault > - 68.75% do_read_fault > - 49.34% filemap_map_pages > 29.71% next_uptodate_folio > + 11.82% folio_wake_bit > + 2.34% filemap_map_folio_range > - 11.97% __do_fault > + 11.93% shmem_fault > + 6.12% folio_wake_bit > + 0.92% finish_fault > > And vm-scalability dropped to same level as latest mainline. > > IMHO, we should slowdown filemap_map_order0_folio() because other benchmark can get benefit Sorry for the typo. I meant "we should not slowdown filemap_map_order0_folio()..". > from faster filemap_map_order0_folio(). Thanks. > > > Regards > Yin, Fengwei >