Re: [PATCH] memcg: event control at vmpressure.

Kyungmin Park <kmpark@xxxxxxxxxxxxx> · Tue, 11 Jun 2013 10:01:27 +0900

On Tue, Jun 11, 2013 at 9:17 AM, Anton Vorontsov <anton@xxxxxxxxxx> wrote:

On Mon, Jun 10, 2013 at 05:12:58PM +0200, Michal Hocko wrote:

> > +           if (level >= ev->level && level != vmpr->current_level) {

> >                     eventfd_signal(ev->efd, 1);

> >                     signalled = true;

> > +                   vmpr->current_level = level;

>

> This would mean that you send a signal for, say, VMPRESSURE_LOW, then

> the reclaim finishes and two days later when you hit the reclaim again

> you would simply miss the event, right?

>

> So, unless I am missing something, then this is plain wrong.

Yup, in it current version, it is not acceptable. For example, sometimes

we do want to see all the _LOW events, since _LOW level shows not just the

level itself, but the activity (i.e. reclaiming process).

There are a few ways to make both parties happy, though.

If the app wants to implement the time-based throttling, then just close

the fd and sleep for needed amount of time (or do not read from the

eventfd -- kernel then will just increment the eventfd counter, so there

won't be context switches at the least). Doing the time-based throttling

in the kernel won't buy us much, I believe.

Or, if you still want the "one-shot"/"edge-triggered" events (which might

make perfect sense for medium and critical levels), then I'd propose to

add some additional flag when you register the event, so that the old

behaviour would be still available for those who need it. This approach I

think is the best one.

Ok we will prepare this way and resend it.

Thank you,
Kyungmin Park