On Thu, Dec 10, 2015 at 11:00:27AM -0500, Johannes Weiner wrote: > On Thu, Dec 10, 2015 at 02:39:14PM +0300, Vladimir Davydov wrote: ... > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index c6a5ed2f2744..993c9a26b637 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -169,6 +169,7 @@ struct mem_cgroup { > > > > /* Accounted resources */ > > struct page_counter memory; > > + struct page_counter swap; > > struct page_counter memsw; > > struct page_counter kmem; > > We should probably separate this to differentiate the new counters > from the old ones. Only memory and swap are actual resources, the > memsw and kmem counters are counting consumer-oriented. Yeah, but we'd better do it in a separate patch. > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > index 457181844b6e..f4b3ccdcba91 100644 > > --- a/include/linux/swap.h > > +++ b/include/linux/swap.h > > @@ -368,11 +368,16 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) > > #endif > > #ifdef CONFIG_MEMCG_SWAP > > extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry); > > +extern int mem_cgroup_charge_swap(struct page *page, swp_entry_t entry); > > Should this be mem_cgroup_try_swap() to keep in line with the page > counter terminology? So it's clear this is not forcing a charge. Hmm, I thought we only use try_charge name in memcontrol.c if there is commit stage, e.g. we have memcg_kmem_charge, not memcg_kmem_try_charge. This conflicts with page_counter semantics though, so we might want to rename it. > > > @@ -1248,12 +1248,15 @@ static unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg) > > { > > unsigned long limit; > > > > - limit = memcg->memory.limit; > > + limit = READ_ONCE(memcg->memory.limit); > > if (mem_cgroup_swappiness(memcg)) { > > unsigned long memsw_limit; > > + unsigned long swap_limit; > > > > - memsw_limit = memcg->memsw.limit; > > - limit = min(limit + total_swap_pages, memsw_limit); > > + memsw_limit = READ_ONCE(memcg->memsw.limit); > > + swap_limit = min(READ_ONCE(memcg->swap.limit), > > + (unsigned long)total_swap_pages); > > + limit = min(limit + swap_limit, memsw_limit); > > } > > return limit; > > This is taking a racy snapshot, so we don't rely on 100% accuracy. Can > we do without the READ_ONCE()? Well, I suppose we can, but passing a volatile value to min macro looks a bit scary to me. What if swap_limit is changed from a finite value less than total_swap_pages to inf while we are there? We might use an infinite memory size in OOM which would screw up OOM scores AFAIU. Unlikely, but still. > > > @@ -5754,26 +5760,66 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) > > memcg_check_events(memcg, page); > > } > > > > +/* > > + * mem_cgroup_charge_swap - charge a swap entry > > + * @page: page being added to swap > > + * @entry: swap entry to charge > > + * > > + * Try to charge @entry to the memcg that @page belongs to. > > + * > > + * Returns 0 on success, -ENOMEM on failure. > > + */ > > +int mem_cgroup_charge_swap(struct page *page, swp_entry_t entry) > > +{ > > + struct mem_cgroup *memcg; > > + struct page_counter *counter; > > + unsigned short oldid; > > + > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || !do_swap_account) > > + return 0; > > + > > + memcg = page->mem_cgroup; > > + > > + /* Readahead page, never charged */ > > + if (!memcg) > > + return 0; > > + > > + if (!mem_cgroup_is_root(memcg) && > > + !page_counter_try_charge(&memcg->swap, 1, &counter)) > > + return -ENOMEM; > > + > > + oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg)); > > + VM_BUG_ON_PAGE(oldid, page); > > + mem_cgroup_swap_statistics(memcg, true); > > + > > + css_get(&memcg->css); > > I think we don't have to duplicate the swap record code. Both cgroup1 > and cgroup2 could run this function to handle the swapout record and > statistics, and then mem_cgroup_swapout() would simply uncharge memsw. Well, may be. I'm afraid this might make mem_cgroup_charge_swap look a bit messy due to necessity to check cgroup_subsys_on_dfl in the middle of it, but I'll give it a try. > > > @@ -5828,6 +5931,8 @@ static int __init mem_cgroup_swap_init(void) > > { > > if (!mem_cgroup_disabled() && really_do_swap_account) { > > do_swap_account = 1; > > + WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, > > + swap_files)); > > WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, > > memsw_cgroup_files)); > > I guess we could also support cgroup.memory=noswap. > Yeah, that would be cleaner. I wonder if we could drop swapaccount boot param then so as not to clutter the API. Thanks for the review! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>