On Thu, Oct 12, 2017 at 03:33:23PM +0200, Jan Kara wrote: > > return; > > > > - if (dax_mapping(mapping)) { > > - dax_delete_mapping_entry(mapping, index); > > - return; > > + dax = dax_mapping(mapping); > > + if (!dax) > > + spin_lock_irq(&mapping->tree_lock); > > + > > + for (i = ei, j = ei; i < pagevec_count(pvec); i++) { > > + struct page *page = pvec->pages[i]; > > + pgoff_t index = indices[i]; > > + > > + if (!radix_tree_exceptional_entry(page)) { > > + pvec->pages[j++] = page; > > + continue; > > + } > > + > > + if (unlikely(dax)) { > > + dax_delete_mapping_entry(mapping, index); > > + continue; > > + } > > + > > + __clear_shadow_entry(mapping, index, page); > > } > > - clear_shadow_entry(mapping, index, entry); > > + > > + if (!dax) > > + spin_unlock_irq(&mapping->tree_lock); > > + pvec->nr = j; > > } > > When I look at this I think could make things cleaner. I have the following > observations: > > 1) All truncate_inode_pages(), invalidate_mapping_pages(), > invalidate_inode_pages2_range() essentially do very similar thing and would > benefit from a similar kind of batching. > While this is true, the benefit is much more marginal that I didn't feel the level of churn was justified. Primarily it would help fadvise() and invalidating when buffered and direct IO is mixed. I didn't think it would be that much cleaner as a result so I left it. > 2) As you observed and measured, batching of radix tree operations makes > sense both when removing pages and shadow entries, I'm very confident it > would make sense for DAX exceptional entries as well. > True, but I didn't have a suitable setup for testing DAX so I wasn't comfortable with making the change. dax_delete_mapping_entry can sleep but it should be as simple as not taking the spinlock in dax_delete_mapping_entry and always locking in truncate_exceptional_pvec_entries. dax is already releasing the mapping->tree_lock if it needs to sleep and I didn't spot any other gotcha but I'd prefer that change was done by someone that can verify it works properly. > 3) In all cases (i.e., those three functions and for all entry types) the > workflow seems to be: > * lockless lookup of entries > * prepare entry for reclaim (or determine it is not elligible) > * lock mapping->tree_lock > * verify entry is still elligible for reclaim (otherwise bail) > * clear radix tree entry > * unlock mapping->tree_lock > * final cleanup of the entry > > So I'm wondering whether we cannot somehow refactor stuff so that batching > of radix tree operations could be shared and we wouldn't have to duplicate > it in all those cases. > > But it would be rather large overhaul of the code so it may be a bit out of > scope for these improvements... > I think it would be out of scope for this improvement but I can look into it if the series is accepted. I think it would be a lot of churn for fairly marginal benefit though. > > @@ -409,8 +445,8 @@ void truncate_inode_pages_range(struct address_space *mapping, > > } > > > > if (radix_tree_exceptional_entry(page)) { > > - truncate_exceptional_entry(mapping, index, > > - page); > > + if (ei != PAGEVEC_SIZE) > > + ei = i; > > This should be ei == PAGEVEC_SIZE I think. > > Otherwise the patch looks good to me so feel free to add: > Fixed. > Reviewed-by: Jan Kara <jack@xxxxxxx> Thanks -- Mel Gorman SUSE Labs