Re: [PATCH 09/19] list_lru: per-node list infrastructure

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 18 Jan 2013 19:08:25 +1100

On Thu, Jan 17, 2013 at 04:51:03PM -0800, Glauber Costa wrote:
> On 01/17/2013 04:10 PM, Dave Chinner wrote:
> > and we end up with:
> > 
> > lru_add(struct lru_list *lru, struct lru_item *item)
> > {
> > 	node_id = min(object_to_nid(item), lru->numnodes);
> > 	
> > 	__lru_add(lru, node_id, &item->global_list);
> > 	if (memcg) {
> > 		memcg_lru = find_memcg_lru(lru->memcg_lists, memcg_id)
> > 		__lru_add_(memcg_lru, node_id, &item->memcg_list);
> > 	}
> > }
> 
> A follow up thought: If we have multiple memcgs, and global pressure
> kicks in (meaning none of them are particularly under pressure),
> shouldn't we try to maintain fairness among them and reclaim equal
> proportions from them all the same way we do with sb's these days, for
> instance?

I don't like the complexity. The global lists will be reclaimed in
LRU order, so it's going to be as fair as can be. If there's a memcg
that has older unused objectsi than the others, then froma global
perspective they should be reclaimed first because the memcg is not
using them...

> I would argue that if your memcg is small, the list of dentries is
> small: scan it all for the nodes you want shouldn't hurt.

on the contrary - the memcg might be small, but what happens if
someone ran a find across all the filesytsems on the system in it?
Then the LRU will be huge, and scanning expensive...

We can't make static decisions about small and large, and we can't
trust heuristics to get it right, either. If we have a single list,
we don't/can't do node-aware reclaim efficiently and so shouldn't
even try.

> if the memcg is big, it will have per-node lists anyway.

But may have no need for them due to the workload. ;)

> Given that, do we really want to pay the price of two list_heads
> in the objects?

I'm just looking at ways at making the infrastructure sane. If the
cost is an extra 16 bytes per object on a an LRU, then that a small
price to pay for having robust memory reclaim infrastructure....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs