On Sun, Nov 24, 2013 at 06:38:25PM -0500, Johannes Weiner wrote: > Reclaim will be leaving shadow entries in the page cache radix tree > upon evicting the real page. As those pages are found from the LRU, > an iput() can lead to the inode being freed concurrently. At this > point, reclaim must no longer install shadow pages because the inode > freeing code needs to ensure the page tree is really empty. > > Add an address_space flag, AS_EXITING, that the inode freeing code > sets under the tree lock before doing the final truncate. Reclaim > will check for this flag before installing shadow pages. > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> .... > @@ -545,10 +546,25 @@ static void evict(struct inode *inode) > */ > inode_wait_for_writeback(inode); > > + /* > + * Page reclaim can not do iput() and thus can race with the > + * inode teardown. Tell it when the address space is exiting, > + * so that it does not install eviction information after the > + * final truncate has begun. > + * > + * As truncation uses a lockless tree lookup, acquire the > + * spinlock to make sure any ongoing tree modification that > + * does not see AS_EXITING is completed before starting the > + * final truncate. > + */ > + spin_lock_irq(&inode->i_data.tree_lock); > + mapping_set_exiting(&inode->i_data); > + spin_unlock_irq(&inode->i_data.tree_lock); > + > if (op->evict_inode) { > op->evict_inode(inode); > } else { > - if (inode->i_data.nrpages) > + if (inode->i_data.nrpages || inode->i_data.nrshadows) > truncate_inode_pages(&inode->i_data, 0); > clear_inode(inode); > } Ok, so what I see here is that we need a wrapper function that handles setting the AS_EXITING flag and doing the "final" truncate_inode_pages() call, and the locking for the AS_EXITING flag moved into mapping_set_exiting() That is, because this AS_EXITING flag and it's locking constraints are directly related to the upcoming truncate_inode_pages() call, I'd prefer to see a helper that captures that relationship used in all the filesystem code. e.g: void truncate_inode_pages_final(struct address_space *mapping) { spin_lock_irq(&mapping->tree_lock); mapping_set_exiting(mapping); spin_unlock_irq(&mapping->tree_lock); if (inode->i_data.nrpages || inode->i_data.nrshadows) truncate_inode_pages_range(mapping, 0, (loff_t)-1); } And document it in Documentation/filesystems/porting as a mandatory function to be called from ->evict_inode() implementations before calling clear_inode(). You can then replace all the direct calls to truncate_inode_pages() in the evict_inode() path with a call to truncate_inode_pages_final(). As it is, I'd really like to see that unconditional irq disable go away from this code - disabling and enabling interrupts for every single inode we reclaim is going to add significant overhead to this hot code path. And given that: > +static inline void mapping_set_exiting(struct address_space *mapping) > +{ > + set_bit(AS_EXITING, &mapping->flags); > +} > + > +static inline int mapping_exiting(struct address_space *mapping) > +{ > + return test_bit(AS_EXITING, &mapping->flags); > +} these atomic bit ops, why do we need to take the tree_lock and disable irqs in evict() to set this bit if there's nothing to truncate on the inode? i.e. something like this: void truncate_inode_pages_final(struct address_space *mapping) { mapping_set_exiting(mapping); if (inode->i_data.nrpages || inode->i_data.nrshadows) { /* * spinlock barrier to ensure all modifications are * complete before we do the final truncate */ spin_lock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock); truncate_inode_pages_range(mapping, 0, (loff_t)-1); } and thereby avoiding the mapping lock altogether for inodes that do not require it to be taken? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html