On Fri 28-06-13 00:35:28, Minchan Kim wrote: > Hi Michal, > > On Thu, Jun 27, 2013 at 11:37:21AM +0200, Michal Hocko wrote: > > On Thu 27-06-13 15:12:10, Hyunhee Kim wrote: > > > In vmpressure, the pressure level is calculated based on the ratio > > > of how many pages were scanned vs. reclaimed in a given time window. > > > However, there is a possibility that "scanned < reclaimed" in such a > > > case, when reclaiming ends by fatal signal in shrink_inactive_list. > > > So, with this patch, we just return "low" level when "scanned < reclaimed" > > > happens not to have userland miss reclaim activity. > > > > Hmm, fatal signal pending on kswapd doesn't make sense to me so it has > > to be a direct reclaim path. Does it really make sense to signal LOW > > when there is probably a big memory pressure and somebody is killing the > > current allocator? > > So, do you want to trigger critical instead of low? > > Now, current is going to die so we can expect shortly we can get a amount > of memory, normally. And also consider that this is per-memcg interface. And so it is even more complicated. If a task dies then there is _no_ guarantee that there will be an uncharge in that group (task could have been migrated to that group so the memory belongs to somebody else). > but yeah, we cannot sure it happens within a bounded time since it > couldn't use reserved memory pool unlike process killed by OOM. The situation should be detected (I am not entirely sure how - e.g. checking for fatal_signals in vmpressure directly) but we shouldn't assume that scanned < reclaimed has any impact on the freed memory. > If we send critical but there isn't big memory pressure, maybe > critical handler would kill some process and the result is that > killing another process unnecessary. That's really thing we should > avoid. > > If we send low but there is a big memory pressure, at least, userland > could be notified and it has a chance to release small memory, which will > help to exit current process so that it could prevent OOM kill and killing > another process unnecessary. > > If we send low but there isn't big memory pressure, totally, we will save > a process. > > > > > The THP case made sense because nr_scanned is in LRU elements units > > while nr_reclaimed is in page units which are different so nr_reclaim > > might be higher than nr_scanned (so nr_taken would be more approapriate > > for vmpressure). > > In case of THP, 512 page is equal to vmpressure_win so if we change > nr_scanned with nr_taken, it could easily make vmpressure notifier Wasn't 512 selected for vmpressure_win exactly for this reason? Shouldn't we rather fix that assumption? Comparing scanned to reclaimed when they operate on different units just sounds strange to me. > level critical even if VM encounter a recent referenced THP page from > LRU tail so I'd like to ignore THP page effect in vmpressure level > calculation. [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>