Sounds good. Reviewed-by: William Kucharski <william.kucharski@xxxxxxxxxx> > On Apr 6, 2021, at 11:48 AM, Collin Fijalkovich <cfijalkovich@xxxxxxxxxx> wrote: > > Instrumenting filemap_nr_thps_inc() should be sufficient for ensuring > writable file mappings will not be THP-backed. > > If filemap_nr_thps_dec() in unaccount_page_cache_page() and > filemap_nr_thps() in do_dentry_open() race, the worst case is an > unnecessary truncation. We could introduce a memory barrier in > unaccount_page_cache_page(), but I'm unsure it would significantly > mitigate the risk of spurious truncations. Barring synchronization > between do_dentry_open() and ongoing page cache operations, there > could be an in-progress delete_from_page_cache_batch() that has not > yet updated accounting for THPs in its targeted range. > > -- Collin > > On Mon, Apr 5, 2021 at 7:05 PM William Kucharski > <william.kucharski@xxxxxxxxxx> wrote: >> >> >> I saw a similar change a few years ago with my prototype: >> >> https://lore.kernel.org/linux-mm/5BB682E1-DD52-4AA9-83E9-DEF091E0C709@xxxxxxxxxx/ >> >> the key being a very nice drop in iTLB-load-misses, so it looks like your code is >> having the right effect. >> >> What about the call to filemap_nr_thps_dec() in unaccount_page_cache_page() - does >> that need an smp_mb() as well? >> >> -- Bill >> >>> On Apr 5, 2021, at 6:15 PM, Collin Fijalkovich <cfijalkovich@xxxxxxxxxx> wrote: >>> >>> v2 has been uploaded with performance testing results: >>> https://lore.kernel.org/patchwork/patch/1408266/ >>> >>> >>> >>> On Mon, Mar 22, 2021 at 2:59 PM Collin Fijalkovich >>> <cfijalkovich@xxxxxxxxxx> wrote: >>>> >>>> Transparent huge pages are supported for read-only non-shmem filesystems, >>>> but are only used for vmas with VM_DENYWRITE. This condition ensures that >>>> file THPs are protected from writes while an application is running >>>> (ETXTBSY). Any existing file THPs are then dropped from the page cache >>>> when a file is opened for write in do_dentry_open(). Since sys_mmap >>>> ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas >>>> produced by execve(). >>>> >>>> Systems that make heavy use of shared libraries (e.g. Android) are unable >>>> to apply VM_DENYWRITE through the dynamic linker, preventing them from >>>> benefiting from the resultant reduced contention on the TLB. >>>> >>>> This patch reduces the constraint on file THPs allowing use with any >>>> executable mapping from a file not opened for write (see >>>> inode_is_open_for_write()). It also introduces additional conditions to >>>> ensure that files opened for write will never be backed by file THPs. >>>> >>>> Restricting the use of THPs to executable mappings eliminates the risk that >>>> a read-only file later opened for write would encounter significant >>>> latencies due to page cache truncation. >>>> >>>> The ld linker flag '-z max-page-size=(hugepage size)' can be used to >>>> produce executables with the necessary layout. The dynamic linker must >>>> map these file's segments at a hugepage size aligned vma for the mapping to >>>> be backed with THPs. >>>> >>>> Signed-off-by: Collin Fijalkovich <cfijalkovich@xxxxxxxxxx> >>>> --- >>>> fs/open.c | 13 +++++++++++-- >>>> mm/khugepaged.c | 16 +++++++++++++++- >>>> 2 files changed, 26 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/fs/open.c b/fs/open.c >>>> index e53af13b5835..f76e960d10ea 100644 >>>> --- a/fs/open.c >>>> +++ b/fs/open.c >>>> @@ -852,8 +852,17 @@ static int do_dentry_open(struct file *f, >>>> * XXX: Huge page cache doesn't support writing yet. Drop all page >>>> * cache for this file before processing writes. >>>> */ >>>> - if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping)) >>>> - truncate_pagecache(inode, 0); >>>> + if (f->f_mode & FMODE_WRITE) { >>>> + /* >>>> + * Paired with smp_mb() in collapse_file() to ensure nr_thps >>>> + * is up to date and the update to i_writecount by >>>> + * get_write_access() is visible. Ensures subsequent insertion >>>> + * of THPs into the page cache will fail. >>>> + */ >>>> + smp_mb(); >>>> + if (filemap_nr_thps(inode->i_mapping)) >>>> + truncate_pagecache(inode, 0); >>>> + } >>>> >>>> return 0; >>>> >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>> index a7d6cb912b05..4c7cc877d5e3 100644 >>>> --- a/mm/khugepaged.c >>>> +++ b/mm/khugepaged.c >>>> @@ -459,7 +459,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma, >>>> >>>> /* Read-only file mappings need to be aligned for THP to work. */ >>>> if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file && >>>> - (vm_flags & VM_DENYWRITE)) { >>>> + !inode_is_open_for_write(vma->vm_file->f_inode) && >>>> + (vm_flags & VM_EXEC)) { >>>> return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff, >>>> HPAGE_PMD_NR); >>>> } >>>> @@ -1872,6 +1873,19 @@ static void collapse_file(struct mm_struct *mm, >>>> else { >>>> __mod_lruvec_page_state(new_page, NR_FILE_THPS, nr); >>>> filemap_nr_thps_inc(mapping); >>>> + /* >>>> + * Paired with smp_mb() in do_dentry_open() to ensure >>>> + * i_writecount is up to date and the update to nr_thps is >>>> + * visible. Ensures the page cache will be truncated if the >>>> + * file is opened writable. >>>> + */ >>>> + smp_mb(); >>>> + if (inode_is_open_for_write(mapping->host)) { >>>> + result = SCAN_FAIL; >>>> + __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr); >>>> + filemap_nr_thps_dec(mapping); >>>> + goto xa_locked; >>>> + } >>>> } >>>> >>>> if (nr_none) { >>>> -- >>>> 2.31.0.rc2.261.g7f71774620-goog >>>> >>> >>