On Thu, May 23, 2019 at 04:41:24PM -0700, Mike Kravetz wrote: > On 5/17/19 8:00 AM, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > > > > The patch below does not apply to the 4.4-stable tree. > > If someone wants it applied there, or to any other stable or longterm > > tree, then please email the backport, including the original git commit > > id to <stable@xxxxxxxxxxxxxxx>. > > From: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Date: Thu, 23 May 2019 13:52:15 -0700 > Subject: [PATCH] hugetlb: use same fault hash key for shared and private > mappings > > commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream. > > hugetlb uses a fault mutex hash table to prevent page faults of the > same pages concurrently. The key for shared and private mappings is > different. Shared keys off address_space and file index. Private > keys off mm and virtual address. Consider a private mappings of a > populated hugetlbfs file. A fault will map the page from the file > and if needed do a COW to map a writable page. > > Hugetlbfs hole punch uses the fault mutex to prevent mappings of file > pages. It uses the address_space file index key. However, private > mappings will use a different key and could race with this code to map > the file page. This causes problems (BUG) for the page cache remove > code as it expects the page to be unmapped. A sample stack is: > > page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) > kernel BUG at mm/filemap.c:169! > ... > RIP: 0010:unaccount_page_cache_page+0x1b8/0x200 > ... > Call Trace: > __delete_from_page_cache+0x39/0x220 > delete_from_page_cache+0x45/0x70 > remove_inode_hugepages+0x13c/0x380 > ? __add_to_page_cache_locked+0x162/0x380 > hugetlbfs_fallocate+0x403/0x540 > ? _cond_resched+0x15/0x30 > ? __inode_security_revalidate+0x5d/0x70 > ? selinux_file_permission+0x100/0x130 > vfs_fallocate+0x13f/0x270 > ksys_fallocate+0x3c/0x80 > __x64_sys_fallocate+0x1a/0x20 > do_syscall_64+0x5b/0x180 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > There seems to be another potential COW issue/race with this approach > of different private and shared keys as noted in commit 8382d914ebf7 > ("mm, hugetlb: improve page-fault scalability"). > > Since every hugetlb mapping (even anon and private) is actually a file > mapping, just use the address_space index key for all mappings. This > results in potentially more hash collisions. However, this should not > be the common case. > > Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@xxxxxxxxxx > Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5 > Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > --- > fs/hugetlbfs/inode.c | 7 ++----- > include/linux/hugetlb.h | 4 +--- > mm/hugetlb.c | 19 +++++-------------- > 3 files changed, 8 insertions(+), 22 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 27c4e2ac39a9..c9f288dbe734 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -414,9 +414,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, > if (next >= end) > break; > > - hash = hugetlb_fault_mutex_hash(h, current->mm, > - &pseudo_vma, > - mapping, next, 0); > + hash = hugetlb_fault_mutex_hash(h, mapping, next, 0); > mutex_lock(&hugetlb_fault_mutex_table[hash]); > > lock_page(page); > @@ -633,8 +631,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, > addr = index * hpage_size; > > /* mutex taken here, fault path and hole punch */ > - hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, > - index, addr); > + hash = hugetlb_fault_mutex_hash(h, mapping, index, addr); > mutex_lock(&hugetlb_fault_mutex_table[hash]); > > /* See if already present in mapping to avoid alloc/free */ Note, this backport causes this warning: fs/hugetlbfs/inode.c: In function hugetlbfs_fallocate: fs/hugetlbfs/inode.c:570:20: warning: unused variable mm [-Wunused-variable] struct mm_struct *mm = current->mm; ^~ So I'll go delete that line as well. thanks, greg k-h