Hi Mark, On Thu, Dec 01, 2022 at 01:46:36PM +0000, Mark Hemment wrote: > On Wed, 9 Nov 2022 at 05:19, Pavankumar Kondeti > <quic_pkondeti@xxxxxxxxxxx> wrote: > > > > When MADV_PAGEOUT is called on a private file mapping VMA region, > > we bail out early if the process is neither owner nor write capable > > of the file. However, this VMA may have both private/shared clean > > pages and private dirty pages. The opportunity of paging out the > > private dirty pages (Anon pages) is missed. Fix this by caching > > the file access check and use it later along with PageAnon() during > > page walk. > > > > We observe ~10% improvement in zram usage, thus leaving more available > > memory on a 4GB RAM system running Android. > > > > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@xxxxxxxxxxx> > > Only scanned review the patch; the logic looks good (as does the > reasoning) but a couple of minor comments; > Thanks for the review and nice suggestions on how the patch can be improved. > > > --- > > mm/madvise.c | 30 +++++++++++++++++++++++------- > > 1 file changed, 23 insertions(+), 7 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index c7105ec..b6b88e2 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -40,6 +40,7 @@ > > struct madvise_walk_private { > > struct mmu_gather *tlb; > > bool pageout; > > + bool can_pageout_file; > > }; > > > > /* > > @@ -328,6 +329,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > struct madvise_walk_private *private = walk->private; > > struct mmu_gather *tlb = private->tlb; > > bool pageout = private->pageout; > > + bool pageout_anon_only = pageout && !private->can_pageout_file; > > struct mm_struct *mm = tlb->mm; > > struct vm_area_struct *vma = walk->vma; > > pte_t *orig_pte, *pte, ptent; > > @@ -364,6 +366,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > if (page_mapcount(page) != 1) > > goto huge_unlock; > > > > + if (pageout_anon_only && !PageAnon(page)) > > + goto huge_unlock; > > + > > if (next - addr != HPAGE_PMD_SIZE) { > > int err; > > > > @@ -432,6 +437,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > if (PageTransCompound(page)) { > > if (page_mapcount(page) != 1) > > break; > > + if (pageout_anon_only && !PageAnon(page)) > > + break; > > get_page(page); > > if (!trylock_page(page)) { > > put_page(page); > > @@ -459,6 +466,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > > if (!PageLRU(page) || page_mapcount(page) != 1) > > continue; > > > > + if (pageout_anon_only && !PageAnon(page)) > > + continue; > > + > > The added PageAnon()s probably do not have a measurable performance > impact, but not ideal when walking a large anonymous mapping (as > '->can_pageout_file' is zero for anon mappings). > Could the code be re-structured so that PageAnon() is only tested when > filtering is needed? Say; > if (pageout_anon_only_filter && !PageAnon(page)) { > continue; > } > where 'pageout_anon_only_filter' is only set for a private named > mapping when do not have write perms on backing object. It would not > be set for anon mappings. > Understood. Like you suggested, PageAnon() check can be eliminated for an anon mapping. will make the necessary changes. > > > VM_BUG_ON_PAGE(PageTransCompound(page), page); > > > > if (pte_young(ptent)) { > > @@ -541,11 +551,13 @@ static long madvise_cold(struct vm_area_struct *vma, > > > > static void madvise_pageout_page_range(struct mmu_gather *tlb, > > struct vm_area_struct *vma, > > - unsigned long addr, unsigned long end) > > + unsigned long addr, unsigned long end, > > + bool can_pageout_file) > > { > > struct madvise_walk_private walk_private = { > > .pageout = true, > > .tlb = tlb, > > + .can_pageout_file = can_pageout_file, > > }; > > > > tlb_start_vma(tlb, vma); > > @@ -553,10 +565,8 @@ static void madvise_pageout_page_range(struct mmu_gather *tlb, > > tlb_end_vma(tlb, vma); > > } > > > > -static inline bool can_do_pageout(struct vm_area_struct *vma) > > +static inline bool can_do_file_pageout(struct vm_area_struct *vma) > > { > > - if (vma_is_anonymous(vma)) > > - return true; > > if (!vma->vm_file) > > return false; > > /* > > @@ -576,17 +586,23 @@ static long madvise_pageout(struct vm_area_struct *vma, > > { > > struct mm_struct *mm = vma->vm_mm; > > struct mmu_gather tlb; > > + bool can_pageout_file; > > > > *prev = vma; > > if (!can_madv_lru_vma(vma)) > > return -EINVAL; > > > > - if (!can_do_pageout(vma)) > > - return 0; > > The removal of this test results in a process, which cannot get write > perms for a shared named mapping, performing a 'walk'. As such a > mapping cannot have anon pages, this walk will be a no-op. Not sure > why a well-behaved program would do a MADV_PAGEOUT on such a mapping, > but if one does this could be considered a (minor performance) > regression. As madvise_pageout() can easily filter this case, might be > worth adding a check. > Got it. we can take care of this edge case by rejecting shared mappings i.e !!(vma->vm_flags & VM_MAYSHARE) == 1 where the process has no write permission. > > > + /* > > + * If the VMA belongs to a private file mapping, there can be private > > + * dirty pages which can be paged out if even this process is neither > > + * owner nor write capable of the file. Cache the file access check > > + * here and use it later during page walk. > > + */ > > + can_pageout_file = can_do_file_pageout(vma); > > > > lru_add_drain(); > > tlb_gather_mmu(&tlb, mm); > > - madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); > > + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr, can_pageout_file); > > tlb_finish_mmu(&tlb); > > > > return 0; > > -- > > 2.7.4 > > > > > Thanks, Pavan