On Wed 24-09-14 11:43:10, Johannes Weiner wrote: > All memory accounting and limiting has been switched over to the > lockless page counters. Bye, res_counter! > More than happily Acked-by: Michal Hocko <mhocko@xxxxxxx> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > --- > Documentation/cgroups/resource_counter.txt | 197 ------------------------- > include/linux/res_counter.h | 223 ----------------------------- > init/Kconfig | 6 - > kernel/Makefile | 1 - > kernel/res_counter.c | 211 --------------------------- > 5 files changed, 638 deletions(-) > delete mode 100644 Documentation/cgroups/resource_counter.txt > delete mode 100644 include/linux/res_counter.h > delete mode 100644 kernel/res_counter.c > > diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt > deleted file mode 100644 > index 762ca54eb929..000000000000 > --- a/Documentation/cgroups/resource_counter.txt > +++ /dev/null > @@ -1,197 +0,0 @@ > - > - The Resource Counter > - > -The resource counter, declared at include/linux/res_counter.h, > -is supposed to facilitate the resource management by controllers > -by providing common stuff for accounting. > - > -This "stuff" includes the res_counter structure and routines > -to work with it. > - > - > - > -1. Crucial parts of the res_counter structure > - > - a. unsigned long long usage > - > - The usage value shows the amount of a resource that is consumed > - by a group at a given time. The units of measurement should be > - determined by the controller that uses this counter. E.g. it can > - be bytes, items or any other unit the controller operates on. > - > - b. unsigned long long max_usage > - > - The maximal value of the usage over time. > - > - This value is useful when gathering statistical information about > - the particular group, as it shows the actual resource requirements > - for a particular group, not just some usage snapshot. > - > - c. unsigned long long limit > - > - The maximal allowed amount of resource to consume by the group. In > - case the group requests for more resources, so that the usage value > - would exceed the limit, the resource allocation is rejected (see > - the next section). > - > - d. unsigned long long failcnt > - > - The failcnt stands for "failures counter". This is the number of > - resource allocation attempts that failed. > - > - c. spinlock_t lock > - > - Protects changes of the above values. > - > - > - > -2. Basic accounting routines > - > - a. void res_counter_init(struct res_counter *rc, > - struct res_counter *rc_parent) > - > - Initializes the resource counter. As usual, should be the first > - routine called for a new counter. > - > - The struct res_counter *parent can be used to define a hierarchical > - child -> parent relationship directly in the res_counter structure, > - NULL can be used to define no relationship. > - > - c. int res_counter_charge(struct res_counter *rc, unsigned long val, > - struct res_counter **limit_fail_at) > - > - When a resource is about to be allocated it has to be accounted > - with the appropriate resource counter (controller should determine > - which one to use on its own). This operation is called "charging". > - > - This is not very important which operation - resource allocation > - or charging - is performed first, but > - * if the allocation is performed first, this may create a > - temporary resource over-usage by the time resource counter is > - charged; > - * if the charging is performed first, then it should be uncharged > - on error path (if the one is called). > - > - If the charging fails and a hierarchical dependency exists, the > - limit_fail_at parameter is set to the particular res_counter element > - where the charging failed. > - > - d. u64 res_counter_uncharge(struct res_counter *rc, unsigned long val) > - > - When a resource is released (freed) it should be de-accounted > - from the resource counter it was accounted to. This is called > - "uncharging". The return value of this function indicate the amount > - of charges still present in the counter. > - > - The _locked routines imply that the res_counter->lock is taken. > - > - e. u64 res_counter_uncharge_until > - (struct res_counter *rc, struct res_counter *top, > - unsigned long val) > - > - Almost same as res_counter_uncharge() but propagation of uncharge > - stops when rc == top. This is useful when kill a res_counter in > - child cgroup. > - > - 2.1 Other accounting routines > - > - There are more routines that may help you with common needs, like > - checking whether the limit is reached or resetting the max_usage > - value. They are all declared in include/linux/res_counter.h. > - > - > - > -3. Analyzing the resource counter registrations > - > - a. If the failcnt value constantly grows, this means that the counter's > - limit is too tight. Either the group is misbehaving and consumes too > - many resources, or the configuration is not suitable for the group > - and the limit should be increased. > - > - b. The max_usage value can be used to quickly tune the group. One may > - set the limits to maximal values and either load the container with > - a common pattern or leave one for a while. After this the max_usage > - value shows the amount of memory the container would require during > - its common activity. > - > - Setting the limit a bit above this value gives a pretty good > - configuration that works in most of the cases. > - > - c. If the max_usage is much less than the limit, but the failcnt value > - is growing, then the group tries to allocate a big chunk of resource > - at once. > - > - d. If the max_usage is much less than the limit, but the failcnt value > - is 0, then this group is given too high limit, that it does not > - require. It is better to lower the limit a bit leaving more resource > - for other groups. > - > - > - > -4. Communication with the control groups subsystem (cgroups) > - > -All the resource controllers that are using cgroups and resource counters > -should provide files (in the cgroup filesystem) to work with the resource > -counter fields. They are recommended to adhere to the following rules: > - > - a. File names > - > - Field name File name > - --------------------------------------------------- > - usage usage_in_<unit_of_measurement> > - max_usage max_usage_in_<unit_of_measurement> > - limit limit_in_<unit_of_measurement> > - failcnt failcnt > - lock no file :) > - > - b. Reading from file should show the corresponding field value in the > - appropriate format. > - > - c. Writing to file > - > - Field Expected behavior > - ---------------------------------- > - usage prohibited > - max_usage reset to usage > - limit set the limit > - failcnt reset to zero > - > - > - > -5. Usage example > - > - a. Declare a task group (take a look at cgroups subsystem for this) and > - fold a res_counter into it > - > - struct my_group { > - struct res_counter res; > - > - <other fields> > - } > - > - b. Put hooks in resource allocation/release paths > - > - int alloc_something(...) > - { > - if (res_counter_charge(res_counter_ptr, amount) < 0) > - return -ENOMEM; > - > - <allocate the resource and return to the caller> > - } > - > - void release_something(...) > - { > - res_counter_uncharge(res_counter_ptr, amount); > - > - <release the resource> > - } > - > - In order to keep the usage value self-consistent, both the > - "res_counter_ptr" and the "amount" in release_something() should be > - the same as they were in the alloc_something() when the releasing > - resource was allocated. > - > - c. Provide the way to read res_counter values and set them (the cgroups > - still can help with it). > - > - c. Compile and run :) > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > deleted file mode 100644 > index 56b7bc32db4f..000000000000 > --- a/include/linux/res_counter.h > +++ /dev/null > @@ -1,223 +0,0 @@ > -#ifndef __RES_COUNTER_H__ > -#define __RES_COUNTER_H__ > - > -/* > - * Resource Counters > - * Contain common data types and routines for resource accounting > - * > - * Copyright 2007 OpenVZ SWsoft Inc > - * > - * Author: Pavel Emelianov <xemul@xxxxxxxxxx> > - * > - * See Documentation/cgroups/resource_counter.txt for more > - * info about what this counter is. > - */ > - > -#include <linux/spinlock.h> > -#include <linux/errno.h> > - > -/* > - * The core object. the cgroup that wishes to account for some > - * resource may include this counter into its structures and use > - * the helpers described beyond > - */ > - > -struct res_counter { > - /* > - * the current resource consumption level > - */ > - unsigned long long usage; > - /* > - * the maximal value of the usage from the counter creation > - */ > - unsigned long long max_usage; > - /* > - * the limit that usage cannot exceed > - */ > - unsigned long long limit; > - /* > - * the limit that usage can be exceed > - */ > - unsigned long long soft_limit; > - /* > - * the number of unsuccessful attempts to consume the resource > - */ > - unsigned long long failcnt; > - /* > - * the lock to protect all of the above. > - * the routines below consider this to be IRQ-safe > - */ > - spinlock_t lock; > - /* > - * Parent counter, used for hierarchial resource accounting > - */ > - struct res_counter *parent; > -}; > - > -#define RES_COUNTER_MAX ULLONG_MAX > - > -/** > - * Helpers to interact with userspace > - * res_counter_read_u64() - returns the value of the specified member. > - * res_counter_read/_write - put/get the specified fields from the > - * res_counter struct to/from the user > - * > - * @counter: the counter in question > - * @member: the field to work with (see RES_xxx below) > - * @buf: the buffer to opeate on,... > - * @nbytes: its size... > - * @pos: and the offset. > - */ > - > -u64 res_counter_read_u64(struct res_counter *counter, int member); > - > -ssize_t res_counter_read(struct res_counter *counter, int member, > - const char __user *buf, size_t nbytes, loff_t *pos, > - int (*read_strategy)(unsigned long long val, char *s)); > - > -int res_counter_memparse_write_strategy(const char *buf, > - unsigned long long *res); > - > -/* > - * the field descriptors. one for each member of res_counter > - */ > - > -enum { > - RES_USAGE, > - RES_MAX_USAGE, > - RES_LIMIT, > - RES_FAILCNT, > - RES_SOFT_LIMIT, > -}; > - > -/* > - * helpers for accounting > - */ > - > -void res_counter_init(struct res_counter *counter, struct res_counter *parent); > - > -/* > - * charge - try to consume more resource. > - * > - * @counter: the counter > - * @val: the amount of the resource. each controller defines its own > - * units, e.g. numbers, bytes, Kbytes, etc > - * > - * returns 0 on success and <0 if the counter->usage will exceed the > - * counter->limit > - * > - * charge_nofail works the same, except that it charges the resource > - * counter unconditionally, and returns < 0 if the after the current > - * charge we are over limit. > - */ > - > -int __must_check res_counter_charge(struct res_counter *counter, > - unsigned long val, struct res_counter **limit_fail_at); > -int res_counter_charge_nofail(struct res_counter *counter, > - unsigned long val, struct res_counter **limit_fail_at); > - > -/* > - * uncharge - tell that some portion of the resource is released > - * > - * @counter: the counter > - * @val: the amount of the resource > - * > - * these calls check for usage underflow and show a warning on the console > - * > - * returns the total charges still present in @counter. > - */ > - > -u64 res_counter_uncharge(struct res_counter *counter, unsigned long val); > - > -u64 res_counter_uncharge_until(struct res_counter *counter, > - struct res_counter *top, > - unsigned long val); > -/** > - * res_counter_margin - calculate chargeable space of a counter > - * @cnt: the counter > - * > - * Returns the difference between the hard limit and the current usage > - * of resource counter @cnt. > - */ > -static inline unsigned long long res_counter_margin(struct res_counter *cnt) > -{ > - unsigned long long margin; > - unsigned long flags; > - > - spin_lock_irqsave(&cnt->lock, flags); > - if (cnt->limit > cnt->usage) > - margin = cnt->limit - cnt->usage; > - else > - margin = 0; > - spin_unlock_irqrestore(&cnt->lock, flags); > - return margin; > -} > - > -/** > - * Get the difference between the usage and the soft limit > - * @cnt: The counter > - * > - * Returns 0 if usage is less than or equal to soft limit > - * The difference between usage and soft limit, otherwise. > - */ > -static inline unsigned long long > -res_counter_soft_limit_excess(struct res_counter *cnt) > -{ > - unsigned long long excess; > - unsigned long flags; > - > - spin_lock_irqsave(&cnt->lock, flags); > - if (cnt->usage <= cnt->soft_limit) > - excess = 0; > - else > - excess = cnt->usage - cnt->soft_limit; > - spin_unlock_irqrestore(&cnt->lock, flags); > - return excess; > -} > - > -static inline void res_counter_reset_max(struct res_counter *cnt) > -{ > - unsigned long flags; > - > - spin_lock_irqsave(&cnt->lock, flags); > - cnt->max_usage = cnt->usage; > - spin_unlock_irqrestore(&cnt->lock, flags); > -} > - > -static inline void res_counter_reset_failcnt(struct res_counter *cnt) > -{ > - unsigned long flags; > - > - spin_lock_irqsave(&cnt->lock, flags); > - cnt->failcnt = 0; > - spin_unlock_irqrestore(&cnt->lock, flags); > -} > - > -static inline int res_counter_set_limit(struct res_counter *cnt, > - unsigned long long limit) > -{ > - unsigned long flags; > - int ret = -EBUSY; > - > - spin_lock_irqsave(&cnt->lock, flags); > - if (cnt->usage <= limit) { > - cnt->limit = limit; > - ret = 0; > - } > - spin_unlock_irqrestore(&cnt->lock, flags); > - return ret; > -} > - > -static inline int > -res_counter_set_soft_limit(struct res_counter *cnt, > - unsigned long long soft_limit) > -{ > - unsigned long flags; > - > - spin_lock_irqsave(&cnt->lock, flags); > - cnt->soft_limit = soft_limit; > - spin_unlock_irqrestore(&cnt->lock, flags); > - return 0; > -} > - > -#endif > diff --git a/init/Kconfig b/init/Kconfig > index eddec767b7ee..e503efe34bc0 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -977,12 +977,6 @@ config CGROUP_CPUACCT > Provides a simple Resource Controller for monitoring the > total CPU consumed by the tasks in a cgroup. > > -config RESOURCE_COUNTERS > - bool "Resource counters" > - help > - This option enables controller independent resource accounting > - infrastructure that works with cgroups. > - > config PAGE_COUNTER > bool > > diff --git a/kernel/Makefile b/kernel/Makefile > index 726e18443da0..245953354974 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -58,7 +58,6 @@ obj-$(CONFIG_USER_NS) += user_namespace.o > obj-$(CONFIG_PID_NS) += pid_namespace.o > obj-$(CONFIG_DEBUG_SYNCHRO_TEST) += synchro-test.o > obj-$(CONFIG_IKCONFIG) += configs.o > -obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o > obj-$(CONFIG_SMP) += stop_machine.o > obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o > obj-$(CONFIG_AUDIT) += audit.o auditfilter.o > diff --git a/kernel/res_counter.c b/kernel/res_counter.c > deleted file mode 100644 > index e791130f85a7..000000000000 > --- a/kernel/res_counter.c > +++ /dev/null > @@ -1,211 +0,0 @@ > -/* > - * resource cgroups > - * > - * Copyright 2007 OpenVZ SWsoft Inc > - * > - * Author: Pavel Emelianov <xemul@xxxxxxxxxx> > - * > - */ > - > -#include <linux/types.h> > -#include <linux/parser.h> > -#include <linux/fs.h> > -#include <linux/res_counter.h> > -#include <linux/uaccess.h> > -#include <linux/mm.h> > - > -void res_counter_init(struct res_counter *counter, struct res_counter *parent) > -{ > - spin_lock_init(&counter->lock); > - counter->limit = RES_COUNTER_MAX; > - counter->soft_limit = RES_COUNTER_MAX; > - counter->parent = parent; > -} > - > -static u64 res_counter_uncharge_locked(struct res_counter *counter, > - unsigned long val) > -{ > - if (WARN_ON(counter->usage < val)) > - val = counter->usage; > - > - counter->usage -= val; > - return counter->usage; > -} > - > -static int res_counter_charge_locked(struct res_counter *counter, > - unsigned long val, bool force) > -{ > - int ret = 0; > - > - if (counter->usage + val > counter->limit) { > - counter->failcnt++; > - ret = -ENOMEM; > - if (!force) > - return ret; > - } > - > - counter->usage += val; > - if (counter->usage > counter->max_usage) > - counter->max_usage = counter->usage; > - return ret; > -} > - > -static int __res_counter_charge(struct res_counter *counter, unsigned long val, > - struct res_counter **limit_fail_at, bool force) > -{ > - int ret, r; > - unsigned long flags; > - struct res_counter *c, *u; > - > - r = ret = 0; > - *limit_fail_at = NULL; > - local_irq_save(flags); > - for (c = counter; c != NULL; c = c->parent) { > - spin_lock(&c->lock); > - r = res_counter_charge_locked(c, val, force); > - spin_unlock(&c->lock); > - if (r < 0 && !ret) { > - ret = r; > - *limit_fail_at = c; > - if (!force) > - break; > - } > - } > - > - if (ret < 0 && !force) { > - for (u = counter; u != c; u = u->parent) { > - spin_lock(&u->lock); > - res_counter_uncharge_locked(u, val); > - spin_unlock(&u->lock); > - } > - } > - local_irq_restore(flags); > - > - return ret; > -} > - > -int res_counter_charge(struct res_counter *counter, unsigned long val, > - struct res_counter **limit_fail_at) > -{ > - return __res_counter_charge(counter, val, limit_fail_at, false); > -} > - > -int res_counter_charge_nofail(struct res_counter *counter, unsigned long val, > - struct res_counter **limit_fail_at) > -{ > - return __res_counter_charge(counter, val, limit_fail_at, true); > -} > - > -u64 res_counter_uncharge_until(struct res_counter *counter, > - struct res_counter *top, > - unsigned long val) > -{ > - unsigned long flags; > - struct res_counter *c; > - u64 ret = 0; > - > - local_irq_save(flags); > - for (c = counter; c != top; c = c->parent) { > - u64 r; > - spin_lock(&c->lock); > - r = res_counter_uncharge_locked(c, val); > - if (c == counter) > - ret = r; > - spin_unlock(&c->lock); > - } > - local_irq_restore(flags); > - return ret; > -} > - > -u64 res_counter_uncharge(struct res_counter *counter, unsigned long val) > -{ > - return res_counter_uncharge_until(counter, NULL, val); > -} > - > -static inline unsigned long long * > -res_counter_member(struct res_counter *counter, int member) > -{ > - switch (member) { > - case RES_USAGE: > - return &counter->usage; > - case RES_MAX_USAGE: > - return &counter->max_usage; > - case RES_LIMIT: > - return &counter->limit; > - case RES_FAILCNT: > - return &counter->failcnt; > - case RES_SOFT_LIMIT: > - return &counter->soft_limit; > - }; > - > - BUG(); > - return NULL; > -} > - > -ssize_t res_counter_read(struct res_counter *counter, int member, > - const char __user *userbuf, size_t nbytes, loff_t *pos, > - int (*read_strategy)(unsigned long long val, char *st_buf)) > -{ > - unsigned long long *val; > - char buf[64], *s; > - > - s = buf; > - val = res_counter_member(counter, member); > - if (read_strategy) > - s += read_strategy(*val, s); > - else > - s += sprintf(s, "%llu\n", *val); > - return simple_read_from_buffer((void __user *)userbuf, nbytes, > - pos, buf, s - buf); > -} > - > -#if BITS_PER_LONG == 32 > -u64 res_counter_read_u64(struct res_counter *counter, int member) > -{ > - unsigned long flags; > - u64 ret; > - > - spin_lock_irqsave(&counter->lock, flags); > - ret = *res_counter_member(counter, member); > - spin_unlock_irqrestore(&counter->lock, flags); > - > - return ret; > -} > -#else > -u64 res_counter_read_u64(struct res_counter *counter, int member) > -{ > - return *res_counter_member(counter, member); > -} > -#endif > - > -int res_counter_memparse_write_strategy(const char *buf, > - unsigned long long *resp) > -{ > - char *end; > - unsigned long long res; > - > - /* return RES_COUNTER_MAX(unlimited) if "-1" is specified */ > - if (*buf == '-') { > - int rc = kstrtoull(buf + 1, 10, &res); > - > - if (rc) > - return rc; > - if (res != 1) > - return -EINVAL; > - *resp = RES_COUNTER_MAX; > - return 0; > - } > - > - res = memparse(buf, &end); > - if (*end != '\0') > - return -EINVAL; > - > - if (PAGE_ALIGN(res) >= res) > - res = PAGE_ALIGN(res); > - else > - res = RES_COUNTER_MAX; > - > - *resp = res; > - > - return 0; > -} > -- > 2.1.0 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>