On Thu, Aug 18, 2016 at 11:01 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > On Thu 18-08-16 10:47:57, Sonny Rao wrote: >> On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: >> > On Wed 17-08-16 11:57:56, Sonny Rao wrote: > [...] >> >> 2) User space OOM handling -- we'd rather do a more graceful shutdown >> >> than let the kernel's OOM killer activate and need to gather this >> >> information and we'd like to be able to get this information to make >> >> the decision much faster than 400ms >> > >> > Global OOM handling in userspace is really dubious if you ask me. I >> > understand you want something better than SIGKILL and in fact this is >> > already possible with memory cgroup controller (btw. memcg will give >> > you a cheap access to rss, amount of shared, swapped out memory as >> > well). Anyway if you are getting close to the OOM your system will most >> > probably be really busy and chances are that also reading your new file >> > will take much more time. I am also not quite sure how is pss useful for >> > oom decisions. >> >> I mentioned it before, but based on experience RSS just isn't good >> enough -- there's too much sharing going on in our use case to make >> the correct decision based on RSS. If RSS were good enough, simply >> put, this patch wouldn't exist. > > But that doesn't answer my question, I am afraid. So how exactly do you > use pss for oom decisions? We use PSS to calculate the memory used by a process among all the processes in the system, in the case of Chrome this tells us how much each renderer process (which is roughly tied to a particular "tab" in Chrome) is using and how much it has swapped out, so we know what the worst offenders are -- I'm not sure what's unclear about that? Chrome tends to use a lot of shared memory so we found PSS to be better than RSS, and I can give you examples of the RSS and PSS on real systems to illustrate the magnitude of the difference between those two numbers if that would be useful. > >> So even with memcg I think we'd have the same problem? > > memcg will give you instant anon, shared counters for all processes in > the memcg. > We want to be able to get per-process granularity quickly. I'm not sure if memcg provides that exactly? >> > Don't take me wrong, /proc/<pid>/totmaps might be suitable for your >> > specific usecase but so far I haven't heard any sound argument for it to >> > be generally usable. It is true that smaps is unnecessarily costly but >> > at least I can see some room for improvements. A simple patch I've >> > posted cut the formatting overhead by 7%. Maybe we can do more. >> >> It seems like a general problem that if you want these values the >> existing kernel interface can be very expensive, so it would be >> generally usable by any application which wants a per process PSS, >> private data, dirty data or swap value. > > yes this is really unfortunate. And if at all possible we should address > that. Precise values require the expensive rmap walk. We can introduce > some caching to help that. But so far it seems the biggest overhead is > to simply format the output and that should be addressed before any new > proc file is added. > >> I mentioned two use cases, but I guess I don't understand the comment >> about why it's not usable by other use cases. > > I might be wrong here but a use of pss is quite limited and I do not > remember anybody asking for large optimizations in that area. I still do > not understand your use cases properly so I am quite skeptical about a > general usefulness of a new file. How do you know that usage of PSS is quite limited? I can only say that we've been using it on Chromium OS for at least four years and have found it very valuable, and I think I've explained the use cases in this thread. If you have more specific questions then I can try to clarify. > > -- > Michal Hocko > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html