On Mon, Jun 24, 2019 at 05:12:46PM -0700, Song Liu wrote: > In previous patch, an application could put part of its text section in > THP via madvise(). These THPs will be protected from writes when the > application is still running (TXTBSY). However, after the application > exits, the file is available for writes. > > This patch avoids writes to file THP by dropping page cache for the file > when the file is open for write. A new counter nr_thps is added to struct > address_space. In do_last(), if the file is open for write and nr_thps > is non-zero, we drop page cache for the whole file. > > Reported-by: kbuild test robot <lkp@xxxxxxxxx> > Signed-off-by: Song Liu <songliubraving@xxxxxx> > --- > fs/inode.c | 3 +++ > fs/namei.c | 23 ++++++++++++++++++++++- > include/linux/fs.h | 32 ++++++++++++++++++++++++++++++++ > mm/filemap.c | 1 + > mm/khugepaged.c | 4 +++- > 5 files changed, 61 insertions(+), 2 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index df6542ec3b88..518113a4e219 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -181,6 +181,9 @@ int inode_init_always(struct super_block *sb, struct inode *inode) > mapping->flags = 0; > mapping->wb_err = 0; > atomic_set(&mapping->i_mmap_writable, 0); > +#ifdef CONFIG_READ_ONLY_THP_FOR_FS > + atomic_set(&mapping->nr_thps, 0); > +#endif > mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE); > mapping->private_data = NULL; > mapping->writeback_index = 0; > diff --git a/fs/namei.c b/fs/namei.c > index 20831c2fbb34..3d95e94029cc 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3249,6 +3249,23 @@ static int lookup_open(struct nameidata *nd, struct path *path, > return error; > } > > +/* > + * The file is open for write, so it is not mmapped with VM_DENYWRITE. If > + * it still has THP in page cache, drop the whole file from pagecache > + * before processing writes. This helps us avoid handling write back of > + * THP for now. > + */ > +static inline void release_file_thp(struct file *file) > +{ > + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) { > + struct inode *inode = file_inode(file); > + > + if (inode_is_open_for_write(inode) && > + filemap_nr_thps(inode->i_mapping)) > + truncate_pagecache(inode, 0); > + } > +} > + > /* > * Handle the last step of open() > */ > @@ -3418,7 +3435,11 @@ static int do_last(struct nameidata *nd, > goto out; > opened: > error = ima_file_check(file, op->acc_mode); > - if (!error && will_truncate) > + if (error) > + goto out; > + > + release_file_thp(file); > + if (will_truncate) > error = handle_truncate(file); This would seem better placed in do_dentry_open(), where we're done with the namespace operation and actually work against the inode. Something roughly like this? diff --git a/fs/open.c b/fs/open.c index b5b80469b93d..cae893edbab6 100644 --- a/fs/open.c +++ b/fs/open.c @@ -799,6 +799,11 @@ static int do_dentry_open(struct file *f, if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO) return -EINVAL; } + + /* XXX: Huge page cache doesn't support writing yet */ + if ((f->f_mode & FMODE_WRITE) && filemap_nr_thps(inode->i_mapping)) + truncate_pagecache(inode, 0); + return 0; cleanup_all: