Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 19, 2012 at 04:51:16PM +0400, Glauber Costa wrote:
> On 01/19/2012 04:48 PM, Johannes Weiner wrote:
> >On Wed, Jan 18, 2012 at 07:15:58PM +0400, Glauber Costa wrote:
> >>There is a case in __sk_mem_schedule(), where an allocation
> >>is beyond the maximum, but yet we are allowed to proceed.
> >>It happens under the following condition:
> >>
> >>	sk->sk_wmem_queued + size>= sk->sk_sndbuf
> >>
> >>The network code won't revert the allocation in this case,
> >>meaning that at some point later it'll try to do it. Since
> >>this is never communicated to the underlying res_counter
> >>code, there is an inbalance in res_counter uncharge operation.
> >>
> >>I see two ways of fixing this:
> >>
> >>1) storing the information about those allocations somewhere
> >>    in memcg, and then deducting from that first, before
> >>    we start draining the res_counter,
> >>2) providing a slightly different allocation function for
> >>    the res_counter, that matches the original behavior of
> >>    the network code more closely.
> >>
> >>I decided to go for #2 here, believing it to be more elegant,
> >>since #1 would require us to do basically that, but in a more
> >>obscure way.
> >>
> >>I will eventually submit it through Dave for the -net tree,
> >>but I wanted to query you guys first, to see if this approach
> >>is acceptable or if you'd prefer me to try something else.
> >>
> >>Thanks
> >>
> >>Signed-off-by: Glauber Costa<glommer@xxxxxxxxxxxxx>
> >>Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@xxxxxxxxxxxxxx>
> >>Cc: Johannes Weiner<hannes@xxxxxxxxxxx>
> >>Cc: Michal Hocko<mhocko@xxxxxxx>
> >>---
> >>  include/linux/res_counter.h |    6 ++++++
> >>  include/net/sock.h          |   10 ++++------
> >>  kernel/res_counter.c        |   25 +++++++++++++++++++++++++
> >>  net/core/sock.c             |    4 ++--
> >>  4 files changed, 37 insertions(+), 8 deletions(-)
> >>
> >>diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> >>index c9d625c..32a7b02 100644
> >>--- a/include/linux/res_counter.h
> >>+++ b/include/linux/res_counter.h
> >>@@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent);
> >>   *
> >>   * returns 0 on success and<0 if the counter->usage will exceed the
> >>   * counter->limit _locked call expects the counter->lock to be taken
> >>+ *
> >>+ * charge_nofail works the same, except that it charges the resource
> >>+ * counter unconditionally, and returns<  0 if the after the current
> >>+ * charge we are over limit.
> >>   */
> >
> >res_counter_margin() assumes usage<= limit is always true.  Just make
> >sure you return 0 if that is not the case, or the charge path can get
> >confused, thinking there is enough room and retry needlessly.
> >
> >Otherwise, looks good.
> 
> You mean return zero in res_counter_charge_fail() if we exceeded the limit?
> 
> I do that, since one needs to know the allocation was supposed to fail.
> Or are you talking about something else ?

Yes, I mean the calculation in res_counter_margin(), which is supposed
to tell the margin for new charges.  Its current code will underflow
and return something large when the usage exceeds the limit, which is
not possible before your patch, so I think you need to include this:

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index c9d625c..d06d014 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -142,7 +142,10 @@ static inline unsigned long long res_counter_margin(struct res_counter *cnt)
 	unsigned long flags;
 
 	spin_lock_irqsave(&cnt->lock, flags);
-	margin = cnt->limit - cnt->usage;
+	if (cnt->limit > cnt->usage)
+		margin = cnt->limit - cnt->usage;
+	else
+		margin = 0;
 	spin_unlock_irqrestore(&cnt->lock, flags);
 	return margin;
 }
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux