On Tue, Jun 11, 2013 at 9:17 AM, Anton Vorontsov <anton@xxxxxxxxxx> wrote:
On Mon, Jun 10, 2013 at 05:12:58PM +0200, Michal Hocko wrote:Yup, in it current version, it is not acceptable. For example, sometimes
> > + if (level >= ev->level && level != vmpr->current_level) {
> > eventfd_signal(ev->efd, 1);
> > signalled = true;
> > + vmpr->current_level = level;
>
> This would mean that you send a signal for, say, VMPRESSURE_LOW, then
> the reclaim finishes and two days later when you hit the reclaim again
> you would simply miss the event, right?
>
> So, unless I am missing something, then this is plain wrong.
we do want to see all the _LOW events, since _LOW level shows not just the
level itself, but the activity (i.e. reclaiming process).
There are a few ways to make both parties happy, though.
If the app wants to implement the time-based throttling, then just close
the fd and sleep for needed amount of time (or do not read from the
eventfd -- kernel then will just increment the eventfd counter, so there
won't be context switches at the least). Doing the time-based throttling
in the kernel won't buy us much, I believe.
Or, if you still want the "one-shot"/"edge-triggered" events (which might
make perfect sense for medium and critical levels), then I'd propose to
add some additional flag when you register the event, so that the old
behaviour would be still available for those who need it. This approach I
think is the best one.
Ok we will prepare this way and resend it.
Thank you,
Kyungmin Park