Re: [PATCH 1/7] memcg: add high/low watermark to res_counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 8 May 2011 22:40:47 -0700
Ying Han <yinghan@xxxxxxxxxx> wrote:

> On Thu, May 5, 2011 at 10:28 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > On Thu, 5 May 2011 08:59:01 +0200
> > Michal Hocko <mhocko@xxxxxxx> wrote:
> >
> >> On Wed 04-05-11 10:16:39, Ying Han wrote:
> >> > On Wed, May 4, 2011 at 1:58 AM, Michal Hocko <mhocko@xxxxxxx> wrote:
> >> > > On Tue 03-05-11 10:01:27, Ying Han wrote:
> >> > >> On Tue, May 3, 2011 at 1:25 AM, Michal Hocko <mhocko@xxxxxxx> wrote:
> >> > >> > On Tue 03-05-11 16:45:23, KOSAKI Motohiro wrote:
> >> > >> >> 2011/5/3 Michal Hocko <mhocko@xxxxxxx>:
> >> > >> >> > On Sun 01-05-11 15:06:02, KOSAKI Motohiro wrote:
> >> > >> >> >> > On Mon 25-04-11 18:28:49, KAMEZAWA Hiroyuki wrote:
> >> > > [...]
> >> > >> >> >> Can you please clarify this? I feel it is not opposite semantics.
> >> > >> >> >
> >> > >> >> > In the global reclaim low watermark represents the point when we _start_
> >> > >> >> > background reclaim while high watermark is the _stopper_. Watermarks are
> >> > >> >> > based on the free memory while this proposal makes it based on the used
> >> > >> >> > memory.
> >> > >> >> > I understand that the result is same in the end but it is really
> >> > >> >> > confusing because you have to switch your mindset from free to used and
> >> > >> >> > from under the limit to above the limit.
> >> > >> >>
> >> > >> >> Ah, right. So, do you have an alternative idea?
> >> > >> >
> >> > >> > Why cannot we just keep the global reclaim semantic and make it free
> >> > >> > memory (hard_limit - usage_in_bytes) based with low limit as the trigger
> >> > >> > for reclaiming?
> >> > >>
> >> > > [...]
> >> > >> The current scheme
> >> > >
> >> > > What is the current scheme?
> >> >
> >> > using the "usage_in_bytes" instead of "free"
> >> >
> >> > >> is closer to the global bg reclaim which the low is triggering reclaim
> >> > >> and high is stopping reclaim. And we can only use the "usage" to keep
> >> > >> the same API.
> >>
> >
> > Sorry for long absence.
> >
> >> And how is this closer to the global reclaim semantic which is based on
> >> the available memory?
> >
> > It's never be the same feature and not a similar feature, I think.
> >
> >> What I am trying to say here is that this new watermark concept doesn't
> >> fit in with the global reclaim. Well, standard user might not be aware
> >> of the zone watermarks at all because they cannot be set. But still if
> >> you are analyzing your memory usage you still check and compare free
> >> memory to min/low/high watermarks to find out what is the current memory
> >> pressure.
> >> If we had another concept with cgroups you would need to switch your
> >> mindset to analyze things.
> >>
> >> I am sorry, but I still do not see any reason why those cgroup watermaks
> >> cannot be based on total-usage.
> >
> > Hmm, so, the interface should be
> >
> > Âmemory.watermark Â--- the total usage which kernel's memory shrinker starts.
> >
> > ?
> 
> 
> >
> > I'm okay with this. And I think this parameter should be fully independent from
> > the limit.
> 
> We need two watermarks like high/low where one is used to trigger the
> background reclaim and the other one is for stopping it. 

For avoiding confusion, I use another word as "shrink_to" and "shrink_over".
When the usage over "shrink_over", the kernel reduce the usage to "shrink_to".


IMHO, determining shrink_over-shrink_to distance is difficult and easy. It's
difficult because it depends on workload and if distacnce is too large,
it will consume much cpu time than expected. It's easy because some small amount of
shrink_over-shrink_to distance works well for usual use, as I set 4MB in my series.
(shrink_over - shrink_to distance is meaningless for users, I think.)

I think shrink_over-shrink_to is an implementation detail just for avoiding
frequent switch on/off memory reclaim, IOW, do jobs in a batched manner.

So, my patch hides "shrink_over" and just shows "shrink_to".


> Using the
> limit to calculate the wmarks is straight-forward since doing
> background reclaim reduces the latency spikes under direct reclaim.
> The direct reclaim is triggered while the usage is hitting the limit.
> 
> This is different from the "soft_limit" which is based on the usage
> and we don't want to reinvent the soft_limit implementation.
> 
Yes, this is a different feature.


The discussion here is how to make APIs for "shrink_to" and "shrink_over", ok ?

I think there are 3 candidates.

  1. using distance to limit.
     memory.shrink_to_distance
           - memory will be freed to 'limit - shrink_to_distance'.
     memory.shrink_over_distance
           - memory will be freed when usage > 'limit - shrink_over_distance'

     Pros.
      - Both of shrink_over and shirnk_to can be determined by users.
      - Can keep stable distance to limit even when limit is changed.
     Cons.
      - complicated and seems not natural.
      - hierarchy support will be very difficult.

  2. using bare value
     memory.shrink_to
           - memory will be freed to this 'shirnk_to'
     memory.shrink_from
           - memory will be freed when usage over this value.
     Pros.
      - Both of shrink_over and shrink)to can be determined by users.
      - easy to understand, straightforward.
      - hierarchy support will be easy.
     Cons.
      - The user may need to change this value when he changes the limit.


  3. using only 'shrink_to'
     memory.shrink_to
           - memory will be freed to this value when the usage goes over this vaue
             to some extent (determined by the system.)

     Pros.
      - easy interface.
      - hierarchy support will be easy.
      - bad configuration check is very easy. 
     Cons.
      - The user may beed to change this value when he changes the limit.


Then, I now vote for 3 because hierarchy support is easiest and enough handy for
real use.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]