On Thu, May 18, 2017 at 09:26:08AM +0200, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > > The patch below does not apply to the 4.10-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git commit > id to <stable@xxxxxxxxxxxxxxx>. > > thanks, > > greg k-h The following patch from Jan was sent in v3 of his series before he had merged with my tracepoint changes, and applies cleanly to v4.10.6. This is for the following upstream commit: commit 13e451fdc1af ("dax: fix data corruption when fault races with write") This patch relies on the previous patch in the series to work correctly, so you'll need to wait for Jan to backport that before applying this one. Here's the commit you need from Jan: commit fb26a1cbed8c ("ext4: return to starting transaction in ext4_dax_huge_fault()") I'll send you a backport for patch 5 in that series tomorrow, a backport of this commit: commit 876f29460cbd ("dax: fix PMD data corruption when fault races with write") Obviously no need to wait for that one before applying this one. :) --- >8 --- >From a9336fe67e892da2fc167e60db6d6069b0a786b1 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@xxxxxxx> Date: Tue, 9 May 2017 14:18:37 +0200 Subject: [PATCH] dax: Fix data corruption when fault races with write Currently DAX read fault can race with write(2) in the following way: CPU1 - write(2) CPU2 - read fault dax_iomap_pte_fault() ->iomap_begin() - sees hole dax_iomap_rw() iomap_apply() ->iomap_begin - allocates blocks dax_iomap_actor() invalidate_inode_pages2_range() - there's nothing to invalidate grab_mapping_entry() - we add zero page in the radix tree and map it to page tables The result is that hole page is mapped into page tables (and thus zeros are seen in mmap) while file has data written in that place. Fix the problem by locking exception entry before mapping blocks for the fault. That way we are sure invalidate_inode_pages2_range() call for racing write will either block on entry lock waiting for the fault to finish (and unmap stale page tables after that) or read fault will see already allocated blocks by write(2). Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 CC: stable@xxxxxxxxxxxxxxx Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Signed-off-by: Jan Kara <jack@xxxxxxx> Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> --- fs/dax.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index a39b404..4bffc4a 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1156,23 +1156,23 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf, if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page) flags |= IOMAP_WRITE; + entry = grab_mapping_entry(mapping, vmf->pgoff, 0); + if (IS_ERR(entry)) + return dax_fault_return(PTR_ERR(entry)); + /* * Note that we don't bother to use iomap_apply here: DAX required * the file system block size to be equal the page size, which means * that we never have to deal with more than a single extent here. */ error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); - if (error) - return dax_fault_return(error); - if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) { - vmf_ret = dax_fault_return(-EIO); /* fs corruption? */ - goto finish_iomap; + if (error) { + vmf_ret = dax_fault_return(error); + goto unlock_entry; } - - entry = grab_mapping_entry(mapping, vmf->pgoff, 0); - if (IS_ERR(entry)) { - vmf_ret = dax_fault_return(PTR_ERR(entry)); - goto finish_iomap; + if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) { + error = -EIO; /* fs corruption? */ + goto error_finish_iomap; } sector = dax_iomap_sector(&iomap, pos); @@ -1194,13 +1194,13 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf, } if (error) - goto error_unlock_entry; + goto error_finish_iomap; __SetPageUptodate(vmf->cow_page); vmf_ret = finish_fault(vmf); if (!vmf_ret) vmf_ret = VM_FAULT_DONE_COW; - goto unlock_entry; + goto finish_iomap; } switch (iomap.type) { @@ -1220,7 +1220,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf, case IOMAP_HOLE: if (!(vmf->flags & FAULT_FLAG_WRITE)) { vmf_ret = dax_load_hole(mapping, &entry, vmf); - goto unlock_entry; + goto finish_iomap; } /*FALLTHRU*/ default: @@ -1229,10 +1229,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf, break; } - error_unlock_entry: + error_finish_iomap: vmf_ret = dax_fault_return(error) | major; - unlock_entry: - put_locked_mapping_entry(mapping, vmf->pgoff, entry); finish_iomap: if (ops->iomap_end) { int copied = PAGE_SIZE; @@ -1247,6 +1245,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf, */ ops->iomap_end(inode, pos, PAGE_SIZE, copied, flags, &iomap); } + unlock_entry: + put_locked_mapping_entry(mapping, vmf->pgoff, entry); return vmf_ret; } EXPORT_SYMBOL_GPL(dax_iomap_fault); -- 2.9.4