Re: [PATCH v6] memcg: event control at vmpressure.

Michal Hocko <mhocko@xxxxxxx> · Fri, 21 Jun 2013 11:19:44 +0200

On Fri 21-06-13 10:22:34, Minchan Kim wrote:
> On Fri, Jun 21, 2013 at 09:24:38AM +0900, Hyunhee Kim wrote:
> > In the original vmpressure, events are triggered whenever there is a reclaim
> > activity. This becomes overheads to user space module and also increases
> 
> Not true.
> We have lots of filter to not trigger event even if reclaim is going on.
> Your statement would make confuse.

Where is the filter implemented? In the kernel? I do not see any
throttling in the current mm tree.

> > power consumption if there is somebody to listen to it. This patch provides
> > options to trigger events only when the pressure level changes.
> > This trigger option can be set when registering each event by writing
> > a trigger option, "edge" or "always", next to the string of levels.
> > "edge" means that the event is triggered only when the pressure level is changed.
> > "always" means that events are triggered whenever there is a reclaim process.
>                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                                                   Not true, either.

Is this about vmpressure_win? But I agree that this could be more
specific. Something like "`Always' trigger option will signal all events
while `edge' option will trigger only events when the level changes."

> > To keep backward compatibility, "always" is set by default if nothing is input
> > as an option. Each event can have different option. For example,
> > "low" level uses "always" trigger option to see reclaim activity at user space
> > while "medium"/"critical" uses "edge" to do an important job
> > like killing tasks only once.
> 
> Question.
> 
> 1. user: set critical edge
> 2. kernel: memory is tight and trigger event with critical
> 3. user: kill a program when he receives a event
> 4. kernel: memory is very tight again and want to trigger a event
>    with critical but fail because last_level was critical and it was edge.
> 
> Right?

yes, this is the risk of the edge triggering and the user has to be
prepared for that. I still think that it makes some sense to have the
two modes.

> > @@ -823,7 +831,7 @@ Test:
> >     # cd /sys/fs/cgroup/memory/
> >     # mkdir foo
> >     # cd foo
> > -   # cgroup_event_listener memory.pressure_level low &
> > +   # cgroup_event_listener memory.pressure_level low edge &
> >     # echo 8000000 > memory.limit_in_bytes
> >     # echo 8000000 > memory.memsw.limit_in_bytes
> >     # echo $$ > tasks
> > diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> > index 736a601..a08252e 100644
> > --- a/mm/vmpressure.c
> > +++ b/mm/vmpressure.c
> > @@ -137,6 +137,8 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
> >  struct vmpressure_event {
> >  	struct eventfd_ctx *efd;
> >  	enum vmpressure_levels level;
> > +	int last_level;
> 
> int? but level is enum vmpressure_levels?

good catch

> > +	bool edge_trigger;
> >  	struct list_head node;
> >  };
> >  
> > @@ -153,11 +155,14 @@ static bool vmpressure_event(struct vmpressure *vmpr,
> >  
> >  	list_for_each_entry(ev, &vmpr->events, node) {
> >  		if (level >= ev->level) {
> > +			if (ev->edge_trigger && level == ev->last_level)
> > +				continue;
> > +
> >  			eventfd_signal(ev->efd, 1);
> >  			signalled = true;
> >  		}
> > +		ev->last_level = level;
> >  	}
> > -
> 
> Unnecessary change.
> 
> >  	mutex_unlock(&vmpr->events_lock);
> >  
> >  	return signalled;
> > @@ -290,9 +295,11 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
> >   *
> >   * This function associates eventfd context with the vmpressure
> >   * infrastructure, so that the notifications will be delivered to the
> > - * @eventfd. The @args parameter is a string that denotes pressure level
> > + * @eventfd. The @args parameters are a string that denotes pressure level
> >   * threshold (one of vmpressure_str_levels, i.e. "low", "medium", or
> > - * "critical").
> > + * "critical") and a trigger option that decides whether events are triggered
> > + * continuously or only on edge ("always" or "edge" if "edge", events
> > + * are triggered when the pressure level changes.
> >   *
> >   * This function should not be used directly, just pass it to (struct
> >   * cftype).register_event, and then cgroup core will handle everything by
> > @@ -303,22 +310,43 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
> >  {
> >  	struct vmpressure *vmpr = cg_to_vmpressure(cg);
> >  	struct vmpressure_event *ev;
> > +	char *strlevel, *strtrigger;
> >  	int level;
> > +	bool trigger;
> 
> What trigger?
> Would be better to use "bool egde" instead?

yes

> > +
> > +	strlevel = args;
> > +	strtrigger = strchr(args, ' ');
> > +
> > +	if (strtrigger) {
> > +		*strtrigger = '\0';
> > +		strtrigger++;
> > +	}
> >  
> >  	for (level = 0; level < VMPRESSURE_NUM_LEVELS; level++) {
> > -		if (!strcmp(vmpressure_str_levels[level], args))
> > +		if (!strcmp(vmpressure_str_levels[level], strlevel))
> >  			break;
> >  	}
> >  
> >  	if (level >= VMPRESSURE_NUM_LEVELS)
> >  		return -EINVAL;
> >  
> > +	if (strtrigger == NULL)
> > +		trigger = false;
> > +	else if (!strcmp(strtrigger, "always"))
> > +		trigger = false;
> > +	else if (!strcmp(strtrigger, "edge"))
> > +		trigger = true;
> > +	else
> > +		return -EINVAL;
> > +
> >  	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> >  	if (!ev)
> >  		return -ENOMEM;
> >  
> >  	ev->efd = eventfd;
> >  	ev->level = level;
> > +	ev->last_level = -1;
> 
> VMPRESSURE_NONE is better?

Yes
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>