Re: [PATCH v6] memcg: event control at vmpressure.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2013/6/21 Michal Hocko <mhocko@xxxxxxx>:
> On Fri 21-06-13 10:22:34, Minchan Kim wrote:
>> On Fri, Jun 21, 2013 at 09:24:38AM +0900, Hyunhee Kim wrote:
>> > In the original vmpressure, events are triggered whenever there is a reclaim
>> > activity. This becomes overheads to user space module and also increases
>>
>> Not true.
>> We have lots of filter to not trigger event even if reclaim is going on.
>> Your statement would make confuse.
>
> Where is the filter implemented? In the kernel? I do not see any
> throttling in the current mm tree.

Thanks for your comments.
As Minchan said, vmpressure_win can filter some of event sinals. But,
when I tested, lots of events are still signaled if a task consume a
lot of memory.
I'll change the expression "whenever there is a reclaim activity".

>
>> > power consumption if there is somebody to listen to it. This patch provides
>> > options to trigger events only when the pressure level changes.
>> > This trigger option can be set when registering each event by writing
>> > a trigger option, "edge" or "always", next to the string of levels.
>> > "edge" means that the event is triggered only when the pressure level is changed.
>> > "always" means that events are triggered whenever there is a reclaim process.
>>                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>                                                   Not true, either.
>
> Is this about vmpressure_win? But I agree that this could be more
> specific. Something like "`Always' trigger option will signal all events
> while `edge' option will trigger only events when the level changes."

I also agree. I'll improve it.

>
>> > To keep backward compatibility, "always" is set by default if nothing is input
>> > as an option. Each event can have different option. For example,
>> > "low" level uses "always" trigger option to see reclaim activity at user space
>> > while "medium"/"critical" uses "edge" to do an important job
>> > like killing tasks only once.
>>
>> Question.
>>
>> 1. user: set critical edge
>> 2. kernel: memory is tight and trigger event with critical
>> 3. user: kill a program when he receives a event
>> 4. kernel: memory is very tight again and want to trigger a event
>>    with critical but fail because last_level was critical and it was edge.
>>
>> Right?
>
> yes, this is the risk of the edge triggering and the user has to be
> prepared for that. I still think that it makes some sense to have the
> two modes.

Right. the above scenario is possible to happen.
And, as Michal said, I also think that a user should handle this
situation. This could be the users' choice to handle continuous events
or handle the above situation by having two modes.

>
>> > @@ -823,7 +831,7 @@ Test:
>> >     # cd /sys/fs/cgroup/memory/
>> >     # mkdir foo
>> >     # cd foo
>> > -   # cgroup_event_listener memory.pressure_level low &
>> > +   # cgroup_event_listener memory.pressure_level low edge &
>> >     # echo 8000000 > memory.limit_in_bytes
>> >     # echo 8000000 > memory.memsw.limit_in_bytes
>> >     # echo $$ > tasks
>> > diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>> > index 736a601..a08252e 100644
>> > --- a/mm/vmpressure.c
>> > +++ b/mm/vmpressure.c
>> > @@ -137,6 +137,8 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>> >  struct vmpressure_event {
>> >     struct eventfd_ctx *efd;
>> >     enum vmpressure_levels level;
>> > +   int last_level;
>>
>> int? but level is enum vmpressure_levels?
>
> good catch

Ok, I'll fix it.

>
>> > +   bool edge_trigger;
>> >     struct list_head node;
>> >  };
>> >
>> > @@ -153,11 +155,14 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>> >
>> >     list_for_each_entry(ev, &vmpr->events, node) {
>> >             if (level >= ev->level) {
>> > +                   if (ev->edge_trigger && level == ev->last_level)
>> > +                           continue;
>> > +
>> >                     eventfd_signal(ev->efd, 1);
>> >                     signalled = true;
>> >             }
>> > +           ev->last_level = level;
>> >     }
>> > -
>>
>> Unnecessary change.

Ok.

>>
>> >     mutex_unlock(&vmpr->events_lock);
>> >
>> >     return signalled;
>> > @@ -290,9 +295,11 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
>> >   *
>> >   * This function associates eventfd context with the vmpressure
>> >   * infrastructure, so that the notifications will be delivered to the
>> > - * @eventfd. The @args parameter is a string that denotes pressure level
>> > + * @eventfd. The @args parameters are a string that denotes pressure level
>> >   * threshold (one of vmpressure_str_levels, i.e. "low", "medium", or
>> > - * "critical").
>> > + * "critical") and a trigger option that decides whether events are triggered
>> > + * continuously or only on edge ("always" or "edge" if "edge", events
>> > + * are triggered when the pressure level changes.
>> >   *
>> >   * This function should not be used directly, just pass it to (struct
>> >   * cftype).register_event, and then cgroup core will handle everything by
>> > @@ -303,22 +310,43 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>> >  {
>> >     struct vmpressure *vmpr = cg_to_vmpressure(cg);
>> >     struct vmpressure_event *ev;
>> > +   char *strlevel, *strtrigger;
>> >     int level;
>> > +   bool trigger;
>>
>> What trigger?
>> Would be better to use "bool egde" instead?
>
> yes

Ok. edge is better.

>
>> > +
>> > +   strlevel = args;
>> > +   strtrigger = strchr(args, ' ');
>> > +
>> > +   if (strtrigger) {
>> > +           *strtrigger = '\0';
>> > +           strtrigger++;
>> > +   }
>> >
>> >     for (level = 0; level < VMPRESSURE_NUM_LEVELS; level++) {
>> > -           if (!strcmp(vmpressure_str_levels[level], args))
>> > +           if (!strcmp(vmpressure_str_levels[level], strlevel))
>> >                     break;
>> >     }
>> >
>> >     if (level >= VMPRESSURE_NUM_LEVELS)
>> >             return -EINVAL;
>> >
>> > +   if (strtrigger == NULL)
>> > +           trigger = false;
>> > +   else if (!strcmp(strtrigger, "always"))
>> > +           trigger = false;
>> > +   else if (!strcmp(strtrigger, "edge"))
>> > +           trigger = true;
>> > +   else
>> > +           return -EINVAL;
>> > +
>> >     ev = kzalloc(sizeof(*ev), GFP_KERNEL);
>> >     if (!ev)
>> >             return -ENOMEM;
>> >
>> >     ev->efd = eventfd;
>> >     ev->level = level;
>> > +   ev->last_level = -1;
>>
>> VMPRESSURE_NONE is better?
>
> Yes

Ok, I'll add VMPRESSURE_NONE. It will be better.

Thanks for review.
Hyunhee Kim.

> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]