On Mon, 3 Feb 2014 19:53:42 -0500 Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > Previously, page cache radix tree nodes were freed after reclaim > emptied out their page pointers. But now reclaim stores shadow > entries in their place, which are only reclaimed when the inodes > themselves are reclaimed. This is problematic for bigger files that > are still in use after they have a significant amount of their cache > reclaimed, without any of those pages actually refaulting. The shadow > entries will just sit there and waste memory. In the worst case, the > shadow entries will accumulate until the machine runs out of memory. > > To get this under control, the VM will track radix tree nodes > exclusively containing shadow entries on a per-NUMA node list. > Per-NUMA rather than global because we expect the radix tree nodes > themselves to be allocated node-locally and we want to reduce > cross-node references of otherwise independent cache workloads. A > simple shrinker will then reclaim these nodes on memory pressure. > > A few things need to be stored in the radix tree node to implement the > shadow node LRU and allow tree deletions coming from the list: > > 1. There is no index available that would describe the reverse path > from the node up to the tree root, which is needed to perform a > deletion. To solve this, encode in each node its offset inside the > parent. This can be stored in the unused upper bits of the same > member that stores the node's height at no extra space cost. > > 2. The number of shadow entries needs to be counted in addition to the > regular entries, to quickly detect when the node is ready to go to > the shadow node LRU list. The current entry count is an unsigned > int but the maximum number of entries is 64, so a shadow counter > can easily be stored in the unused upper bits. > > 3. Tree modification needs tree lock and tree root, which are located > in the address space, so store an address_space backpointer in the > node. The parent pointer of the node is in a union with the 2-word > rcu_head, so the backpointer comes at no extra cost as well. > > 4. The node needs to be linked to an LRU list, which requires a list > head inside the node. This does increase the size of the node, but > it does not change the number of objects that fit into a slab page. changelog forgot to mention that this reclaim is performed via a shrinker... How expensive is that list walk in scan_shadow_nodes()? I assume in the best case it will bale out after nr_to_scan iterations? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html