Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/19/2012 05:12 PM, Johannes Weiner wrote:
On Thu, Jan 19, 2012 at 04:51:16PM +0400, Glauber Costa wrote:
On 01/19/2012 04:48 PM, Johannes Weiner wrote:
On Wed, Jan 18, 2012 at 07:15:58PM +0400, Glauber Costa wrote:
There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:

	sk->sk_wmem_queued + size>= sk->sk_sndbuf

The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.

I see two ways of fixing this:

1) storing the information about those allocations somewhere
    in memcg, and then deducting from that first, before
    we start draining the res_counter,
2) providing a slightly different allocation function for
    the res_counter, that matches the original behavior of
    the network code more closely.

I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.

I will eventually submit it through Dave for the -net tree,
but I wanted to query you guys first, to see if this approach
is acceptable or if you'd prefer me to try something else.

Thanks

Signed-off-by: Glauber Costa<glommer@xxxxxxxxxxxxx>
Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@xxxxxxxxxxxxxx>
Cc: Johannes Weiner<hannes@xxxxxxxxxxx>
Cc: Michal Hocko<mhocko@xxxxxxx>
---
  include/linux/res_counter.h |    6 ++++++
  include/net/sock.h          |   10 ++++------
  kernel/res_counter.c        |   25 +++++++++++++++++++++++++
  net/core/sock.c             |    4 ++--
  4 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index c9d625c..32a7b02 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent);
   *
   * returns 0 on success and<0 if the counter->usage will exceed the
   * counter->limit _locked call expects the counter->lock to be taken
+ *
+ * charge_nofail works the same, except that it charges the resource
+ * counter unconditionally, and returns<   0 if the after the current
+ * charge we are over limit.
   */

res_counter_margin() assumes usage<= limit is always true.  Just make
sure you return 0 if that is not the case, or the charge path can get
confused, thinking there is enough room and retry needlessly.

Otherwise, looks good.

You mean return zero in res_counter_charge_fail() if we exceeded the limit?

I do that, since one needs to know the allocation was supposed to fail.
Or are you talking about something else ?

Yes, I mean the calculation in res_counter_margin(), which is supposed
to tell the margin for new charges.  Its current code will underflow
and return something large when the usage exceeds the limit, which is
not possible before your patch, so I think you need to include this:

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index c9d625c..d06d014 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -142,7 +142,10 @@ static inline unsigned long long res_counter_margin(struct res_counter *cnt)
  	unsigned long flags;

  	spin_lock_irqsave(&cnt->lock, flags);
-	margin = cnt->limit - cnt->usage;
+	if (cnt->limit>  cnt->usage)
+		margin = cnt->limit - cnt->usage;
+	else
+		margin = 0;
  	spin_unlock_irqrestore(&cnt->lock, flags);
  	return margin;
  }
Ah, I see.

Okay, I will update my patch to include this. (Or would you think it would be better to add this as a preparation patch?)

--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux