On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote: <> > Hum, but now thinking more about it I have hard time figuring out why write > vs fault cannot actually still race: > > CPU1 - write(2) CPU2 - read fault > > dax_iomap_pte_fault() > ->iomap_begin() - sees hole > dax_iomap_rw() > iomap_apply() > ->iomap_begin - allocates blocks > dax_iomap_actor() > invalidate_inode_pages2_range() > - there's nothing to invalidate > grab_mapping_entry() > - we add zero page in the radix > tree & map it to page tables > > Similarly read vs write fault may end up racing in a wrong way and try to > replace already existing exceptional entry with a hole page? Yep, this race seems real to me, too. This seems very much like the issues that exist when a thread is doing direct I/O. One thread is doing I/O to an intermediate buffer (page cache for direct I/O case, zero page for us), and the other is going around it directly to media, and they can get out of sync. IIRC the direct I/O code looked something like: 1/ invalidate existing mappings 2/ do direct I/O to media 3/ invalidate mappings again, just in case. Should be cheap if there weren't any conflicting faults. This makes sure any new allocations we made are faulted in. I guess one option would be to replicate that logic in the DAX I/O path, or we could try and enhance our locking so page faults can't race with I/O since both can allocate blocks. I'm not sure, but will think on it.