On Wed, Oct 10, 2018 at 5:37 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, 24 Aug 2018 17:45:42 +0200 Jan Kara <jack@xxxxxxx> wrote: > > > In DAX mode a write pagefault can race with write(2) in the following > > way: > > > > CPU0 CPU1 > > write fault for mapped zero page (hole) > > dax_iomap_rw() > > iomap_apply() > > xfs_file_iomap_begin() > > - allocates blocks > > dax_iomap_actor() > > invalidate_inode_pages2_range() > > - invalidates radix tree entries in given range > > dax_iomap_pte_fault() > > grab_mapping_entry() > > - no entry found, creates empty > > ... > > xfs_file_iomap_begin() > > - finds already allocated block > > ... > > vmf_insert_mixed_mkwrite() > > - WARNs and does nothing because there > > is still zero page mapped in PTE > > unmap_mapping_pages() > > > > This race results in WARN_ON from insert_pfn() and is occasionally > > triggered by fstest generic/344. Note that the race is otherwise > > harmless as before write(2) on CPU0 is finished, we will invalidate page > > tables properly and thus user of mmap will see modified data from > > write(2) from that point on. So just restrict the warning only to the > > case when the PFN in PTE is not zero page. > > > > ... > > > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1787,10 +1787,15 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr, > > * in may not match the PFN we have mapped if the > > * mapped PFN is a writeable COW page. In the mkwrite > > * case we are creating a writable PTE for a shared > > - * mapping and we expect the PFNs to match. > > + * mapping and we expect the PFNs to match. If they > > + * don't match, we are likely racing with block > > + * allocation and mapping invalidation so just skip the > > + * update. > > */ > > - if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn))) > > + if (pte_pfn(*pte) != pfn_t_to_pfn(pfn)) { > > + WARN_ON_ONCE(!is_zero_pfn(pte_pfn(*pte))); > > goto out_unlock; > > + } > > entry = *pte; > > Shouldn't we just remove the warning? We know it happens and we know > why it happens and we know it's harmless. What's the point in scaring > people? tl;dr let's keep it. I think this fix effectively pushes this into "can't happen" territory, but if it does our dax assumptions are off somewhere else. So, I think this is useful for developers hacking around in the dax code to make sure they aren't breaking some fundamental assumption.