On Tue, Dec 20, 2016 at 2:23 PM, Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> wrote: > Currently dax_mapping_entry_mkclean() fails to clean and write protect the > pmd_t of a DAX PMD entry during an *sync operation. This can result in > data loss in the following sequence: > > 1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the > pmd_t dirty and writeable > 2) fsync, flushing out PMD data and cleaning the radix tree entry. We > currently fail to mark the pmd_t as clean and write protected. > 3) more mmap writes to the PMD. These don't cause any page faults since > the pmd_t is dirty and writeable. The radix tree entry remains clean. > 4) fsync, which fails to flush the dirty PMD data because the radix tree > entry was clean. > 5) crash - dirty data that should have been fsync'd as part of 4) could > still have been in the processor cache, and is lost. > > Fix this by marking the pmd_t clean and write protected in > dax_mapping_entry_mkclean(), which is called as part of the fsync > operation 2). This will cause the writes in step 3) above to generate page > faults where we'll re-dirty the PMD radix tree entry, resulting in flushes > in the fsync that happens in step 4). > > Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> > Cc: Jan Kara <jack@xxxxxxx> > Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush") > --- > fs/dax.c | 51 ++++++++++++++++++++++++++++++++++++--------------- > include/linux/mm.h | 2 -- > mm/memory.c | 4 ++-- > 3 files changed, 38 insertions(+), 19 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 5c74f60..ddcddfe 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -691,8 +691,8 @@ static void dax_mapping_entry_mkclean(struct address_space *mapping, > pgoff_t index, unsigned long pfn) > { > struct vm_area_struct *vma; > - pte_t *ptep; > - pte_t pte; > + pte_t pte, *ptep = NULL; > + pmd_t *pmdp = NULL; > spinlock_t *ptl; > bool changed; > > @@ -707,21 +707,42 @@ static void dax_mapping_entry_mkclean(struct address_space *mapping, > > address = pgoff_address(index, vma); > changed = false; > - if (follow_pte(vma->vm_mm, address, &ptep, &ptl)) > + if (follow_pte_pmd(vma->vm_mm, address, &ptep, &pmdp, &ptl)) > continue; > - if (pfn != pte_pfn(*ptep)) > - goto unlock; > - if (!pte_dirty(*ptep) && !pte_write(*ptep)) > - goto unlock; > > - flush_cache_page(vma, address, pfn); > - pte = ptep_clear_flush(vma, address, ptep); > - pte = pte_wrprotect(pte); > - pte = pte_mkclean(pte); > - set_pte_at(vma->vm_mm, address, ptep, pte); > - changed = true; > -unlock: > - pte_unmap_unlock(ptep, ptl); > + if (pmdp) { > +#ifdef CONFIG_FS_DAX_PMD > + pmd_t pmd; > + > + if (pfn != pmd_pfn(*pmdp)) > + goto unlock_pmd; > + if (!pmd_dirty(*pmdp) && !pmd_write(*pmdp)) > + goto unlock_pmd; > + > + flush_cache_page(vma, address, pfn); > + pmd = pmdp_huge_clear_flush(vma, address, pmdp); > + pmd = pmd_wrprotect(pmd); > + pmd = pmd_mkclean(pmd); > + set_pmd_at(vma->vm_mm, address, pmdp, pmd); > + changed = true; > +unlock_pmd: > + spin_unlock(ptl); > +#endif Can we please kill this ifdef? I know we've had problems with ARCH=um builds in the past with undefined pmd helpers, but to me that simply means we now need to extend the FS_DAX blacklist to include UML diff --git a/fs/Kconfig b/fs/Kconfig index c2a377cdda2b..661931fb0ce0 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -37,7 +37,7 @@ source "fs/f2fs/Kconfig" config FS_DAX bool "Direct Access (DAX) support" depends on MMU - depends on !(ARM || MIPS || SPARC) + depends on !(ARM || MIPS || SPARC || UML) help Direct Access (DAX) can be used on memory-backed block devices. If the block device supports DAX and the filesystem supports DAX, -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html