On Wed, Nov 27, 2013 at 09:29:37AM +1100, Dave Chinner wrote: > On Tue, Nov 26, 2013 at 04:27:25PM -0500, Johannes Weiner wrote: > > On Tue, Nov 26, 2013 at 10:49:21AM +1100, Dave Chinner wrote: > > > On Sun, Nov 24, 2013 at 06:38:28PM -0500, Johannes Weiner wrote: > > > > Previously, page cache radix tree nodes were freed after reclaim > > > > emptied out their page pointers. But now reclaim stores shadow > > > > entries in their place, which are only reclaimed when the inodes > > > > themselves are reclaimed. This is problematic for bigger files that > > > > are still in use after they have a significant amount of their cache > > > > reclaimed, without any of those pages actually refaulting. The shadow > > > > entries will just sit there and waste memory. In the worst case, the > > > > shadow entries will accumulate until the machine runs out of memory. > .... > > > .... > > > > + radix_tree_replace_slot(slot, page); > > > > + if (node) { > > > > + node->count++; > > > > + /* Installed page, can't be shadow-only anymore */ > > > > + if (!list_empty(&node->lru)) > > > > + list_lru_del(&workingset_shadow_nodes, &node->lru); > > > > + } > > > > + return 0; > > > > > > Hmmmmm - what's the overhead of direct management of LRU removal > > > here? Most list_lru code uses lazy removal (i.e. via the shrinker) > > > to avoid having to touch the LRU when adding new references to an > > > object..... > > > > It's measurable in microbenchmarks, but not when any real IO is > > involved. The difference was in the noise even on SSD drives. > > Well, it's not an SSD or two I'm worried about - it's devices that > can do millions of IOPS where this is likely to be noticable... > > > The other list_lru users see items only once they become unused and > > subsequent references are expected to be few and temporary, right? > > They go onto the list when the refcount falls to zero, but reuse can > be frequent when being referenced repeatedly by a single user. That > avoids every reuse from removing the object from the LRU then > putting it back on the LRU for every reference cycle... That's true, but it's less of a concern in the radix_tree_node case because it takes a full inactive list cycle after a refault before the node is put back on the LRU. Or a really unlikely placed partial node truncation/invalidation (full truncation would just delete the whole node anyway). > > We expect pages to refault in spades on certain loads, at which point > > we may have thousands of those nodes on the list that are no longer > > reclaimable (10k nodes for about 2.5G of cache). > > Sure, look at the way the inode and dentry caches work - entire > caches of millions of inodes and dentries often sit on the LRUs. A > quick look at my workstations dentry cache shows: > > $ at /proc/sys/fs/dentry-state > 180108 170596 45 0 0 0 > > 180k allocated dentries, 170k sitting on the LRU... Hm, and a significant amount of those 170k could rotate on the next shrinker scan due to recent references or do you generally have smaller spikes? But as per above I think the case for lazily removing shadow nodes is less convincing than for inodes and dentries. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html