On Thu, 24 Jan 2019, Dominique Martinet wrote: > Jiri, you've offered resubmitting the last two patches properly, can you > incorporate this change or should I just send this directly? (I'd take > most of your commit message and add your name somewhere) I've been running some basic smoke testing with the kernel from https://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git/log/?h=pagecache-sidechannel-v2 (attaching the respective two patches to apply on top of latest Linus' tree to this mail as well), and everything looks good so far. Thanks, -- Jiri Kosina SUSE Labs
From 9810565f1d5f966a84900cdcb85e33aa7571afbe Mon Sep 17 00:00:00 2001 From: Jiri Kosina <jkosina@xxxxxxx> Date: Wed, 16 Jan 2019 20:53:17 +0100 Subject: [PATCH 1/2] mm/mincore: make mincore() more conservative The semantics of what mincore() considers to be resident is not completely clear, but Linux has always (since 2.3.52, which is when mincore() was initially done) treated it as "page is available in page cache". That's potentially a problem, as that [in]directly exposes meta-information about pagecache / memory mapping state even about memory not strictly belonging to the process executing the syscall, opening possibilities for sidechannel attacks. Change the semantics of mincore() so that it only reveals pagecache information for non-anonymous mappings that belog to files that the calling process could (if it tried to) successfully open for writing. Originally-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Originally-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx> Signed-off-by: Jiri Kosina <jkosina@xxxxxxx> --- mm/mincore.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/mm/mincore.c b/mm/mincore.c index 218099b5ed31..747a4907a3ac 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -169,6 +169,14 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return 0; } +static inline bool can_do_mincore(struct vm_area_struct *vma) +{ + return vma_is_anonymous(vma) || + (vma->vm_file && + (inode_owner_or_capable(file_inode(vma->vm_file)) + || inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0)); +} + /* * Do a chunk of "sys_mincore()". We've already checked * all the arguments, we hold the mmap semaphore: we should @@ -189,8 +197,13 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v vma = find_vma(current->mm, addr); if (!vma || addr < vma->vm_start) return -ENOMEM; - mincore_walk.mm = vma->vm_mm; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); + if (!can_do_mincore(vma)) { + unsigned long pages = (end - addr) >> PAGE_SHIFT; + memset(vec, 1, pages); + return pages; + } + mincore_walk.mm = vma->vm_mm; err = walk_page_range(addr, end, &mincore_walk); if (err < 0) return err; -- 2.12.3
From f287185fc5e0ffbbb380f2d68dd19290715829a8 Mon Sep 17 00:00:00 2001 From: Jiri Kosina <jkosina@xxxxxxx> Date: Wed, 16 Jan 2019 21:06:58 +0100 Subject: [PATCH 2/2] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O preadv2(RWF_NOWAIT) can be used to open a side-channel to pagecache contents, as it reveals metadata about residency of pages in pagecache. If preadv2(RWF_NOWAIT) returns immediately, it provides a clear "page not resident" information, and vice versa. Close that sidechannel by always initiating readahead on the cache if we encounter a cache miss for preadv2(RWF_NOWAIT); with that in place, probing the pagecache residency itself will actually populate the cache, making the sidechannel useless. Originally-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Jiri Kosina <jkosina@xxxxxxx> --- mm/filemap.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 9f5e323e883e..7bcdd36e629d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2075,8 +2075,6 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, page = find_get_page(mapping, index); if (!page) { - if (iocb->ki_flags & IOCB_NOWAIT) - goto would_block; page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); -- 2.12.3