On Tue, Oct 15, 2013 at 10:05:26PM -0400, Rik van Riel wrote: > On 10/15/2013 07:41 PM, Dave Chinner wrote: > > On Tue, Oct 15, 2013 at 01:41:28PM -0400, Johannes Weiner wrote: > > >> I'm not forgetting about them, I just track them very coarsely by > >> linking up address spaces and then lazily enforce their upper limit > >> when memory is tight by using the shrinker callback. The assumption > >> was that actually scanning them is such a rare event that we trade the > >> rare computational costs for smaller memory consumption most of the > >> time. > > > > Sure, I understand the tradeoff that you made. But there's nothing > > worse than a system that slows down unpredictably because of some > > magic threshold in some subsystem has been crossed and > > computationally expensive operations kick in. > > The shadow shrinker should remove the radix nodes with > the oldest shadow entries first, so true LRU should actually > work for the radix tree nodes. > > Actually, since we only care about the age of the youngest > shadow entry in each radix tree node, FIFO will be the same > as LRU for that list. > > That means the shrinker can always just take the radix tree > nodes off the end. Right, but it can't necessarily free the node as it may still have pointers to pages in it. In that case, it would have to simply rotate the page to the end of the LRU again. Unless, of course, we kept track of the number of exceptional entries in a node and didn't add it to the reclaim list until there were no non-expceptional entries in the node.... > >> But it > >> looks like tracking radix tree nodes with a list and backpointers to > >> the mapping object for the lock etc. will be a major pain in the ass. > > > > Perhaps so - it may not work out when we get down to the fine > > details... > > I suspect that a combination of lifetime rules (inode cannot > disappear until all the radix tree nodes) and using RCU free > for the radix tree nodes, and the inodes might do the trick. > > That would mean that, while holding the rcu read lock, the > back pointer from a radix tree node to the inode will always > point to valid memory. Yes, that is what I was thinking... > That allows the shrinker to lock the inode, and verify that > the inode is still valid, before it attempts to rcu free the > radix tree node with shadow entries. Lock the mapping, not the inode. The radix tree is protected by the mapping_lock, not an inode lock. i.e. I'd hope that this can all b contained within the struct address_space and not require any knowledge of inodes or inode lifecycles at all. > It also means that locking only needs to be in the inode, > and on the LRU list for shadow radix tree nodes. > > Does that sound sane? > > Am I overlooking something? It's pretty much along the same lines of what I was thinking, but lets see what Johannes thinks. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html