Re: [LSF/MM TOPIC] proposals for topics

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 26 Jan 2016 18:07:52 +0100

On 01/25/2016 07:45 PM, Johannes Weiner wrote:
>- One of the long lasting issue related to the OOM handling is when to
>   actually declare OOM. There are workloads which might be trashing on
>   few last remaining pagecache pages or on the swap which makes the
>   system completely unusable for considerable amount of time yet the
>   OOM killer is not invoked. Can we finally do something about that?
I'm working on this, but it's not an easy situation to detect.

We can't decide based on amount of page cache, as you could have very
little of it and still be fine. Most of it could still be used-once.

We can't decide based on number or rate of (re)faults, because this
spikes during startup and workingset changes, or can be even sustained
when working with a data set that you'd never expect to fit into
memory in the first place, while still making acceptable progress.

I would hope that workingset should help distinguish workloads thrashing 
due to low memory and those that can't fit there no matter what? Or 
would it require tracking lifetime of so many evicted pages that the 
memory overhead of that would be infeasible?

The only thing that I could come up with as a meaningful metric here
is the share of actual walltime that is spent waiting on refetching
stuff from disk. If we know that in the last X seconds, the whole
system spent more than idk 95% of its time waiting on the disk to read
recently evicted data back into the cache, then it's time to kick the
OOM killer, as this state is likely not worth maintaining.

Such a "thrashing time" metric could be great to export to userspace
in general as it can be useful in other situations, such as quickly
gauging how comfortable a workload is (inside a container), and how
much time is wasted due to underprovisioning of memory. Because it
isn't just the pathological cases, you migh just wait a bit here and
there and could it still add up to a sizable portion of a job's time.

If other people think this could be a useful thing to talk about, I'd
be happy to discuss it at the conference.

I think this discussion would be useful, yeah.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html