reply: reply: reply: [RFC PATCH 1/1] mm: mark folio accessed in minor fault

黄朝阳 (Zhaoyang Huang) <zhaoyang.huang@xxxxxxxxxx> · Tue, 2 Jan 2024 05:46:13 +0000



>I update the patch on v515 as below[1], which calls
>mark_page_accessed(mapped_page) in filemap_map_pages and
>filemap_fault<in minor fault path> to promote none-single-use pages(by
>waiving major fault) earlier than the first scan does now. The patch is verified
>on a 2G RAM based android system by the test script, which loops the
>following 4 steps[2] for 30 times and measures the numbers of trace event
>"mm_filemap_add_to_page_cache/mm_filemap_delete_from_page_cache"
>which can be deemed as page cache retention rate(could be thrashing rate
>also). In terms of the test result, RFC could save 10% of the add_to page in
>each phase and get a better APP start time(decreased
>5%) .
>
>[1]
>diff --git a/mm/filemap.c b/mm/filemap.c index 279380c..308e415 100644
>+++ b/mm/filemap.c
>
>@@ -3124,6 +3125,7 @@
>  filemap_invalidate_lock_shared(mapping);
>  mapping_locked = true;
>  }
>+ mark_page_accessed(page);
>  } else {
>  /* No page in the page cache at all */
>  count_vm_event(PGMAJFAULT);
>@@ -3147,7 +3149,7 @@
>  goto out_retry;
>  filemap_invalidate_unlock_shared(mapping);
>  return VM_FAULT_OOM;
>  }
> }
>
>  if (!lock_page_maybe_drop_mmap(vmf, page, &fpin)) @@ -3388,8 +3390,10
>@@
>
>  /* We're about to handle the fault */
>  if (vmf->address == addr)
>   ret = VM_FAULT_NOPAGE;
>
>+ if (page_mapcount(page))
>+ mark_page_accessed(page);
>  do_set_pte(vmf, page, addr);
>  /* no need to invalidate: a not-present page won't be cached */
>  update_mmu_cache(vma, addr, vmf->pte);
>
>[2]
>1. start an APP
>2. malloc and mlock 512MB pages
>3. restart the APP
>4. kill the APP
>
>[3]
>
>
>v515
>                                                      RFC
>
>add_to
>              delete_from                    add_to
>delete_from
>1. start an APP                                       41290
>     88279                             32235
>79305
>2. malloc and mlock 512MB pages        342103               374396
>                      304310                  339048
>3. restart the APP                                   46552
>    162456                           42279
>176368
>4. kill the APP

Sorry, I am really mad at gmail's plain text mode, resend the result table via outlook
                               v515                                                      RFC
                               add_to              delete_from                  add_to               delete_from
1. start an APP       41290               88279                           32235                79305
2. malloc and mlock 512MB pages        
                               342103             374396                         304310              339048
3. restart the APP   46552               162456                          42279               176368
4. kill the APP

>
>On Sat, Dec 23, 2023 at 10:41 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
>>
>> On Fri, Dec 22, 2023 at 2:41 AM 黄朝阳 (Zhaoyang Huang)
>> <zhaoyang.huang@xxxxxxxxxx> wrote:
>> >
>> >
>> >
>> > On Fri, Dec 22, 2023 at 2:45 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
>> > >
>> > > On Thu, Dec 21, 2023 at 11:29 PM 黄朝阳 (Zhaoyang Huang)
>> > > <zhaoyang.huang@xxxxxxxxxx> wrote:
>> > > >
>> > > >
>> > > > On Thu, Dec 21, 2023 at 10:53 PM Zhaoyang Huang
><huangzhaoyang@xxxxxxxxx> wrote:
>> > > > >
>> > > > > On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@xxxxxxxxxx>
>wrote:
>> > > > > >
>> > > > > > On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang
><huangzhaoyang@xxxxxxxxx> wrote:
>> > > > > > >
>> > > > > > > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@xxxxxxxxxx>
>wrote:
>> > > > > > > >
>> > > > > > > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox
><willy@xxxxxxxxxxxxx> wrote:
>> > > > > > > > >
>> > > > > > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang
>wrote:
>> > > > > > > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox
><willy@xxxxxxxxxxxxx> wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800,
>zhaoyang.huang wrote:
>> > > > > > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
>> > > > > > > > > > > >
>> > > > > > > > > > > > Inactive mapped folio will be promoted to active
>> > > > > > > > > > > > only when it is scanned in shrink_inactive_list,
>> > > > > > > > > > > > while the vfs folio will do this immidiatly when it is
>accessed. These will introduce two affections:
>> > > > > > > > > > > >
>> > > > > > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
>> > > > > > > > > > > > 2. Low reclaiming efficiency caused by dummy nactive
>folio which should
>> > > > > > > > > > > >    be kept as earlier as shrink_active_list.
>> > > > > > > > > > > >
>> > > > > > > > > > > > I would like to suggest mark the folio be
>> > > > > > > > > > > > accessed in minor fault to solve this situation.
>> > > > > > > > > > >
>> > > > > > > > > > > This isn't going to be as effective as you
>> > > > > > > > > > > imagine.  Almost all file faults are handled
>> > > > > > > > > > > through filemap_map_pages().  So I must ask, what
>testing have you done with this patch?
>> > > > > > > > > > >
>> > > > > > > > > > > And while you're gathering data, what effect would
>> > > > > > > > > > > this patch have on your workloads?
>> > > > > > > > > > Thanks for heads-up, I am out of date for readahead
>> > > > > > > > > > mechanism. My goal
>> > > > > > > > >
>> > > > > > > > > It's not a terribly new mechanism ...
>> > > > > > > > > filemap_map_pages() was added nine years ago in 2014
>> > > > > > > > > by commit f1820361f83d
>> > > > > > > > >
>> > > > > > > > > > is to have mapped file pages behave like other pages
>> > > > > > > > > > which could be promoted immediately when they are
>> > > > > > > > > > accessed. I will update the patch and provide benchmark
>data in new patch set.
>> > > > > > > > >
>> > > > > > > > > Understood.  I don't know the history of this, so I'm
>> > > > > > > > > not sure if the decision to not mark folios as accessed here
>was intentional or not.
>> > > > > > > > > I suspect it's entirely unintentional.
>> > > > > > > >
>> > > > > > > > It's intentional. For the active/inactive LRU, all
>> > > > > > > > folios start inactive. The first scan of a folio
>> > > > > > > > transfers the A-bit (if it's set during the initial
>> > > > > > > > fault) to PG_referenced; the second scan of this folio,
>> > > > > > > > if the A-bit is set again, moves it to the active list.
>> > > > > > > > This way single-use folios, i.e., folios mapped for file
>> > > > > > > > streaming, can be reclaimed quickly, since they are "demoted"
>rather than "promoted" on the second scan. This RFC would regress memory
>streaming workloads.
>> > > > > > > Thanks. Please correct me if I am wrong. IMO, there will
>> > > > > > > be no minor-fault for single-use folios
>> > > > > >
>> > > > > > Why not? What prevents a specific *access pattern* from triggering
>minor faults?
>> > > > > Please find the following chart for mapped page state machine
>> > > > > transfication.
>> > > >
>> > > > > I'm not sure what you are asking me to look at -- is the
>> > > > > following trying to illustrate something related to my question above?
>> > > >
>> > > > sorry for my fault on table generation, resend it, I am trying
>> > > > to present how RFC performs in a page's stat transfer
>> > > >
>> > > > 1. RFC behaves the same as the mainline in (1)(2) 2. VM_EXEC
>> > > > mapped pages are activated earlier than mainline which help
>> > > > improve scan efficiency in (3)(4) 3. none VM_EXEC mapped pages are
>dropped as vfs pages do during 3rd scan.
>> > > >
>> > > > (1)
>> > > >                                   1st access
>shrink_active_list              1st scan(shink_folio_list)       2nd
>scan(shrink_folio_list')
>> > > > mainline                     INA/UNR
>NA                          INA/REF
>DROP
>> > > > RFC                           INA/UNR
>NA                           INA/REF
>DROP
>> > >
>> > > > I don't think this is the case -- with this RFC, *readahead*
>> > > > folios, which are added into pagecache as INA/UNR, become
>> > > > PG_referenced upon the initial fault (first access), i.e.,
>> > > > INA/REF. The first scan will actually activate them, i.e., they
>> > > > become ACT/UNR, because they have both PG_referenced and the
>A-bit.
>> > > No,Sorry for the confusion. This RFC actually aims at minor fault
>> > > of the faulted pages(with one pte setup). In terms of the
>> > > readahead pages, can we solve it by add one criteria as bellow,
>> > > which unifies all kinds of mapped pages in RFC.
>>
>> Again this is still wrong -- how do you know the other process mapping
>> this folio isn't also streaming the file?
>>
>> It'd be best to take a step back and think through my original
>> question: what prevents a specific *access pattern* from triggering
>> minor faults? The simple answer is that you can't.
>I agree with that and get more puzzled. The RFC's goal is that the more minor
>faults over the page the sooner it gets promoted as vfs pages do.
>
>It's intentional. For the active/inactive LRU, all folios start inactive. The first
>scan of a folio transfers the A-bit (if it's set during the initial fault) to
>PG_referenced; [RFC behaves the same as above] the second scan of this folio,
>if the A-bit is set again, moves it to the active list.
>[RFC is NOT against this but just let minor faults promote the page in advance]
>This way single-use folios, i.e., folios mapped for file streaming, can be
>reclaimed quickly, since they are "demoted" rather than "promoted" on the
>second scan. This RFC would regress memory streaming workloads.
>
>
>>
>> > > @@ -3273,6 +3273,12 @@ vm_fault_t filemap_fault(struct vm_fault
>*vmf)
>> > >         */
>> > >        folio = filemap_get_folio(mapping, index);
>> > >        if (likely(!IS_ERR(folio))) {
>> > > +               /*
>> > > +                * try to promote inactive folio here when it is
>accessed
>> > > +                * as minor fault
>> > > +                */
>> > > +               if(folio_mapcount(folio))
>> > > +                       folio_mark_accessed(folio);
>> > >                /*
>> > >                 * We found the page, so try async readahead before
>waiting for
>> > >                 * the lock.
>> > >
>> > Please find bellow for the stat machine table of updated RFC, where RFC
>behaves same or enhances the scan efficiency by promoting the page in
>shrink_active_list.
>> >
>> > (1)
>> >                                   1st access
>shrink_active_list              1st scan(shink_folio_list)       2nd
>scan(shrink_folio_list')
>> > mainline                     INA/UNR
>NA                          INA/REF
>DROP
>> > RFC                           INA/UNR
>NA                          INA/REF
>DROP
>> > RA                              INA/UNR
>NA                          INA/REF
>DROP
>> >
>> > (2)
>> >                                   1st access
>2nd access               shrink_active_list          1st
>scan(shink_folio_list)
>> > mainline                     INA/UNR
>INA/UNR                       NA
>ACT/REF
>> > RFC                           INA/UNR
>INA/REF                       NA
>ACT/REF
>> > RA                              INA/UNR
>INA/REF                       NA
>ACT/REF
>> >
>> > (3)
>> >                                   1st access        1st
>scan(shink_folio_list)       2nd access      2nd scan(shrink_active_list)
>3rd scan(shrink_folio_list)
>> > mainline                     INA/UNR
>INA/REF                           INA/REF                NA
>ACT/REF
>> > RFC                           INA/UNR
>INA/REF                           ACT/REF
>ACT/REF                           NA
>> > (VM_EXEC)
>> > RFC                           INA/UNR
>INA/REF                           ACT/REF
>INA/REF                            DROP
>> > (non VM_EXEC)
>> > RA                              INA/UNR
>INA/REF                           INA/REF                NA
>ACT/REF
>> >
>> > (4)
>> >                                   1st access
>2nd access                   3rd access          1st
>scan(shrink_active_list)   2nd scan(shink_folio_list)
>> > mainline                     INA/UNR
>INA/UNR                       INA/UNR                    NA
>ACT/REF
>> > RFC                           INA/UNR
>INA/REF                         ACT/REF
>ACT/REF                       NA
>> > (VM_EXEC)
>> > RFC                           INA/UNR
>INA/REF                        ACT/REF
>ACT/REF                       NA
>> > (Non VM_EXEC)
>> > RA                              INA/UNR
>INA/REF                        ACT/REF
>ACT/REF                       NA
>> > > >
>> > > > So it doesn't behave the same way the mainline does for the
>> > > > first case you listed. (I didn't look at the rest of the cases.)