Re: [LSF/MM TOPIC] memory control groups

Johannes Weiner <hannes@xxxxxxxxxxx> · Tue, 18 Jan 2011 11:20:06 +0100

On Tue, Jan 18, 2011 at 06:17:57PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 18 Jan 2011 09:40:13 +0100
> Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > On Tue, Jan 18, 2011 at 10:10:57AM +0900, KAMEZAWA Hiroyuki wrote:
> > > - pc->page can be replaced with some lookup routine.
> > >   But Section bit encoding may be something mysterious and look up cost
> > >   will be problem.
> > 
> > Why is that?
> > 
> > The lookup is actually straight-forward, like lookup_page_cgroup().
> > And we only need it when coming from the per-cgroup LRU, i.e. in
> > reclaim and force_empty.
> >  
> 
> I see usage of pc->page is not very frequent. But I wonder we should
> revisit performance of lookup_page_cgroup() before adding new weight.

I think those are two different things to tackle.  But I will make
sure to check for performance overhead when removing pc->page.

> > > - I'm not sure PCG_MIGRATION. It's for avoiding races.
> > 
> > That's also a scary patch...  Yeah, it's to prevent uncharging of
> > oldpage in case migration fails and it has to be reused.  I changed
> > the migration sequence for memcg a bit so that we don't have to do
> > that anymore.  It survived basic testing.
> > 
> 
> Hmm. I saw level down of migration under memcg several times. So, I don't
> want to modify running one without enough reason.
> I guess all SECTION_BITS can be encoded to pc->flags without diet of flags.

That's true, there is enough room for that.

Those reduction patches I only wrote to also pack the pc->mem_cgroup
ID into pc->flags, but these are two independent problems.

I would not have finished the patch only for that one tiny flag, but
it actually saved code and made it IMO a bit easier to understand.  I
consider this a serious upside of code that has a history of breaking.

But one at the time, first I will finish testing and benchmarking the
pc->page removal.

> > E.g. I have a suspicion that we might be able to do dirty accounting
> > without all the flags (we have them in the page anyway!) but use
> > proportionals instead.  It's not page-accurate, but I think the
> > fundamental problem is solved: when the dirty ratio is exceeded,
> > throttle the cgroup with the biggest dirty share.
> 
> Using proportionals is a choice. But, IIUC, users of memcg wants 
> something like /proc/meminfo. It doesn't match.
> If I'm an user of container, I want an information like /proc/meminfo for
> container.

I totally agree that this is information that needs exporting.

But you can easily calculate an absolute number of bytes by applying a
memcg's relative proportion to the absolute amount of dirty pages for
example.  The only difference is that it probably won't be 100%
accurate, but a few pages difference should really not matter for
user-visible statistics.

No?

> Anyway, if the kernel goes to merge IO-less page reclaim, dirty ratio
> support is the 1st thing we have to implement.
> Without that, memcg will easily OOM.

Agreed.  I am not saying that my memory footprint concerns should
stand in the way of merging important infrastructure.  This is work
that can still be done even after dirty accounting is merged.

Thanks,
	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>