On Mon 01-05-17 16:38:55, Ross Zwisler wrote: > > So for now I'm still more inclined to just stay with the radix tree lock as > > is and just fix up the locking as I suggest and go for larger rewrite only > > if we can demonstrate further performance wins. > > Sounds good. > > > WRT your second patch, if we go with the locking as I suggest, it is enough > > to unmap the whole range after invalidate_inode_pages2() has cleared radix > > tree entries (*) which will be much cheaper (for large writes) than doing > > unmapping entry by entry. > > I'm still not convinced that it is safe to do the unmap in a separate step. I > see your point about it being expensive to do a rmap walk to unmap each entry > in __dax_invalidate_mapping_entry(), but I think we might need to because the > unmap is part of the contract imposed by invalidate_inode_pages2_range() and > invalidate_inode_pages2(). This exists in the header comment above each: > > * Any pages which are found to be mapped into pagetables are unmapped prior > * to invalidation. > > If you look at the usage of invalidate_inode_pages2_range() in > generic_file_direct_write() for example (which I realize we won't call for a > DAX inode, but still), I think that it really does rely on the fact that > invalidated pages are unmapped, right? If it didn't, and hole pages were > mapped, the hole pages could remain mapped while a direct I/O write allocated > blocks and then wrote real data. > > If we really want to unmap the entire range at once, maybe it would have to be > done in invalidate_inode_pages2_range(), after the loop? My hesitation about > this is that we'd be leaking yet more DAX special casing up into the > mm/truncate.c code. > > Or am I missing something? No, my thinking was to put the invalidation at the end of invalidate_inode_pages2_range(). I agree it means more special-casing for DAX in mm/truncate.c. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR