On Fri 20-03-15 14:48:20, Dave Chinner wrote: > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote: > > On Thu 19-03-15 18:14:39, Dave Chinner wrote: > > > On Wed, Mar 18, 2015 at 03:55:28PM +0100, Michal Hocko wrote: > > > > On Wed 18-03-15 10:44:11, Rik van Riel wrote: > > > > > On 03/18/2015 10:09 AM, Michal Hocko wrote: > > > > > > page_cache_read has been historically using page_cache_alloc_cold to > > > > > > allocate a new page. This means that mapping_gfp_mask is used as the > > > > > > base for the gfp_mask. Many filesystems are setting this mask to > > > > > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is, > > > > > > however, not called from the fs layer > > > > > > > > > > Is that true for filesystems that have directories in > > > > > the page cache? > > > > > > > > I haven't found any explicit callers of filemap_fault except for ocfs2 > > > > and ceph and those seem OK to me. Which filesystems you have in mind? > > > > > > Just about every major filesystem calls filemap_fault through the > > > .fault callout. > > > > That is right but the callback is called from the VM layer where we > > obviously do not take any fs locks (we are holding only mmap_sem > > for reading). > > Those who call filemap_fault directly (ocfs2 and ceph) and those > > who call the callback directly: qxl_ttm_fault, radeon_ttm_fault, > > kernfs_vma_fault, shm_fault seem to be safe from the reclaim recursion > > POV. radeon_ttm_fault takes a lock for reading but that one doesn't seem > > to be used from the reclaim context. > > > > Or did I miss your point? Are you concerned about some fs overloading > > filemap_fault and do some locking before delegating to filemap_fault? > > The latter: > > https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b I will have a look at the code to see what we can do about it. > > > GFP_KERNEL allocation for mappings is simply wrong. All mapping > > > allocations where the caller cannot pass a gfp_mask need to obey > > > the mapping_gfp_mask that is set by the mapping owner.... > > > > Hmm, I thought this is true only when the function might be called from > > the fs path. > > How do you know in, say, mpage_readpages, you aren't being called > from a fs path that holds locks? e.g. we can get there from ext4 > doing readdir, so it is holding an i_mutex lock at that point. > > Many other paths into mpages_readpages don't hold locks, but there > are some that do, and those that do need functionals like this to > obey the mapping_gfp_mask because it is set appropriately for the > allocation context of the inode that owns the mapping.... What about the following? --- >From 5d905cb291138d61bbab056845d6e53bc4451ec8 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxx> Date: Thu, 19 Mar 2015 14:56:56 +0100 Subject: [PATCH 1/2] mm: do not ignore mapping_gfp_mask in page cache allocation paths page_cache_read, do_generic_file_read, __generic_file_splice_read and __ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling add_to_page_cache_lru which might cause recursion into fs down in the direct reclaim path if the mapping really relies on GFP_NOFS semantic. This doesn't seem to be the case now because page_cache_read (page fault path) doesn't seem to suffer from the reclaim recursion issues and do_generic_file_read and __generic_file_splice_read also shouldn't be called under fs locks which would deadlock in the reclaim path. Anyway it is better to obey mapping gfp mask and prevent from later breakage. Signed-off-by: Michal Hocko <mhocko@xxxxxxx> --- fs/ntfs/file.c | 2 +- fs/splice.c | 2 +- mm/filemap.c | 6 ++++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index 1da9b2d184dc..568c9dbc7e61 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -422,7 +422,7 @@ static inline int __ntfs_grab_cache_pages(struct address_space *mapping, } } err = add_to_page_cache_lru(*cached_page, mapping, index, - GFP_KERNEL); + GFP_KERNEL & mapping_gfp_mask(mapping)); if (unlikely(err)) { if (err == -EEXIST) continue; diff --git a/fs/splice.c b/fs/splice.c index 75c6058eabf2..71f6c51f019a 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -360,7 +360,7 @@ __generic_file_splice_read(struct file *in, loff_t *ppos, break; error = add_to_page_cache_lru(page, mapping, index, - GFP_KERNEL); + GFP_KERNEL & mapping_gfp_mask(mapping)); if (unlikely(error)) { page_cache_release(page); if (error == -EEXIST) diff --git a/mm/filemap.c b/mm/filemap.c index 968cd8e03d2e..4756cba51655 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1656,7 +1656,8 @@ no_cached_page: goto out; } error = add_to_page_cache_lru(page, mapping, - index, GFP_KERNEL); + index, + GFP_KERNEL & mapping_gfp_mask(mapping)); if (error) { page_cache_release(page); if (error == -EEXIST) { @@ -1756,7 +1757,8 @@ static int page_cache_read(struct file *file, pgoff_t offset) if (!page) return -ENOMEM; - ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL); + ret = add_to_page_cache_lru(page, mapping, offset, + GFP_KERNEL & mapping_gfp_mask(mapping)); if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST) -- 2.1.4 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>