(2014/09/19 22:22), Johannes Weiner wrote: > Memory is internally accounted in bytes, using spinlock-protected > 64-bit counters, even though the smallest accounting delta is a page. > The counter interface is also convoluted and does too many things. > > Introduce a new lockless word-sized page counter API, then change all > memory accounting over to it and remove the old one. The translation > from and to bytes then only happens when interfacing with userspace. > > Aside from the locking costs, this gets rid of the icky unsigned long > long types in the very heart of memcg, which is great for 32 bit and > also makes the code a lot more readable. > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> I like this patch because I hate res_counter very much. a few nitpick comments.. <snip> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 19df5d857411..bf8fb1a05597 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -54,6 +54,38 @@ struct mem_cgroup_reclaim_cookie { > }; > > #ifdef CONFIG_MEMCG > + > +struct page_counter { > + atomic_long_t count; > + unsigned long limit; > + struct page_counter *parent; > + > + /* legacy */ > + unsigned long watermark; > + unsigned long limited; > +}; I guees all attributes should be on the same cache line. How about align this to cache ? And legacy values are not very important to be atomic by design, right ? > + > +#if BITS_PER_LONG == 32 > +#define PAGE_COUNTER_MAX ULONG_MAX > +#else > +#define PAGE_COUNTER_MAX (ULONG_MAX / PAGE_SIZE) > +#endif > + <snip> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index e2def11f1ec1..dfd3b15a57e8 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -25,7 +25,6 @@ > * GNU General Public License for more details. > */ > > -#include <linux/res_counter.h> > #include <linux/memcontrol.h> > #include <linux/cgroup.h> > #include <linux/mm.h> > @@ -66,6 +65,117 @@ > > #include <trace/events/vmscan.h> > > +int page_counter_cancel(struct page_counter *counter, unsigned long nr_pages) > +{ > + long new; > + > + new = atomic_long_sub_return(nr_pages, &counter->count); > + > + if (WARN_ON(unlikely(new < 0))) > + atomic_long_set(&counter->count, 0); WARN_ON_ONCE() ? Or I prefer atomic_add(&counter->count, nr_pages) rather than set to 0 because if a buggy call's "nr_pages" is enough big, following calls to page_counter_cacnel() will show more logs. > + > + return new > 1; > +} > + > +int page_counter_charge(struct page_counter *counter, unsigned long nr_pages, > + struct page_counter **fail) > +{ > + struct page_counter *c; > + > + for (c = counter; c; c = c->parent) { > + for (;;) { > + unsigned long count; > + unsigned long new; > + > + count = atomic_long_read(&c->count); > + > + new = count + nr_pages; > + if (new > c->limit) { > + c->limited++; > + if (fail) { > + *fail = c; > + goto failed; > + } seeing res_counter(), c ret code for this case should be -ENOMEM. > + } > + > + if (atomic_long_cmpxchg(&c->count, count, new) != count) > + continue; > + > + if (new > c->watermark) > + c->watermark = new; > + > + break; > + } > + } > + return 0; > + > +failed: > + for (c = counter; c != *fail; c = c->parent) > + page_counter_cancel(c, nr_pages); > + > + return -ENOMEM; > +} > + > +int page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages) > +{ > + struct page_counter *c; > + int ret = 1; > + > + for (c = counter; c; c = c->parent) { > + int remainder; > + > + remainder = page_counter_cancel(c, nr_pages); > + if (c == counter && !remainder) > + ret = 0; > + } > + > + return ret; > +} > + > +int page_counter_limit(struct page_counter *counter, unsigned long limit) > +{ > + for (;;) { > + unsigned long count; > + unsigned long old; > + > + count = atomic_long_read(&counter->count); > + > + old = xchg(&counter->limit, limit); > + > + if (atomic_long_read(&counter->count) != count) { > + counter->limit = old; > + continue; > + } > + > + if (count > limit) { > + counter->limit = old; > + return -EBUSY; > + } > + > + return 0; > + } > +} I think the whole "updating limit" ops should be mutual exclusive. It seems there will be trouble if multiple updater comes at once. So, "xchg" isn't required. calllers should have their own locks. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html