On Fri 19-08-16 11:26:34, Minchan Kim wrote: > Hi Michal, > > On Thu, Aug 18, 2016 at 08:01:04PM +0200, Michal Hocko wrote: > > On Thu 18-08-16 10:47:57, Sonny Rao wrote: > > > On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > On Wed 17-08-16 11:57:56, Sonny Rao wrote: > > [...] > > > >> 2) User space OOM handling -- we'd rather do a more graceful shutdown > > > >> than let the kernel's OOM killer activate and need to gather this > > > >> information and we'd like to be able to get this information to make > > > >> the decision much faster than 400ms > > > > > > > > Global OOM handling in userspace is really dubious if you ask me. I > > > > understand you want something better than SIGKILL and in fact this is > > > > already possible with memory cgroup controller (btw. memcg will give > > > > you a cheap access to rss, amount of shared, swapped out memory as > > > > well). Anyway if you are getting close to the OOM your system will most > > > > probably be really busy and chances are that also reading your new file > > > > will take much more time. I am also not quite sure how is pss useful for > > > > oom decisions. > > > > > > I mentioned it before, but based on experience RSS just isn't good > > > enough -- there's too much sharing going on in our use case to make > > > the correct decision based on RSS. If RSS were good enough, simply > > > put, this patch wouldn't exist. > > > > But that doesn't answer my question, I am afraid. So how exactly do you > > use pss for oom decisions? > > My case is not for OOM decision but I agree it would be great if we can get > *fast* smap summary information. > > PSS is really great tool to figure out how processes consume memory > more exactly rather than RSS. We have been used it for monitoring > of memory for per-process. Although it is not used for OOM decision, > it would be great if it is speed up because we don't want to spend > many CPU time for just monitoring. > > For our usecase, we don't need AnonHugePages, ShmemPmdMapped, Shared_Hugetlb, > Private_Hugetlb, KernelPageSize, MMUPageSize because we never enable THP and > hugetlb. Additionally, Locked can be known via vma flags so we don't need it, > either. Even, we don't need address range for just monitoring when we don't > investigate in detail. > > Although they are not severe overhead, why does it emit the useless > information? Even bloat day by day. :( With that, userspace tools should > spend more time to parse which is pointless. So far it doesn't really seem that the parsing is the biggest problem. The major cycles killer is the output formatting and that doesn't sound like a problem we are not able to address. And I would even argue that we want to address it in a generic way as much as possible. > Having said that, I'm not fan of creating new stat knob for that, either. > How about appending summary information in the end of smap? > So, monitoring users can just open the file and lseek to the (end - 1) and > read the summary only. That might confuse existing parsers. Besides that we already have /proc/<pid>/statm which gives cumulative numbers already. I am not sure how often it is used and whether the pte walk is too expensive for existing users but that should be explored and evaluated before a new file is created. The /proc became a dump of everything people found interesting just because we were to easy to allow those additions. Do not repeat those mistakes, please! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html