On Fri, Oct 29, 2010 at 9:04 AM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > On Fri, 29 Oct 2010 08:28:23 +0900 > Minchan Kim <minchan.kim@xxxxxxxxx> wrote: > >> On Fri, Oct 29, 2010 at 7:03 AM, Mandeep Singh Baines <msb@xxxxxxxxxxxx> wrote: >> > Andrew Morton (akpm@xxxxxxxxxxxxxxxxxxxx) wrote: >> >> On Thu, 28 Oct 2010 12:15:23 -0700 >> >> Mandeep Singh Baines <msb@xxxxxxxxxxxx> wrote: >> >> >> >> > On ChromiumOS, we do not use swap. >> >> >> >> Well that's bad. Why not? >> >> >> > >> > We're using SSDs. We're still in the "make it work" phase so wanted >> > avoid swap unless/until we learn how to use it effectively with >> > an SSD. >> > >> > You'll want to tune swap differently if you're using an SSD. Not sure >> > if swappiness is the answer. Maybe a new tunable to control how aggressive >> > swap is unless such a thing already exits? >> > >> >> > When memory is low, the only way to >> >> > free memory is to reclaim pages from the file list. This results in a >> >> > lot of thrashing under low memory conditions. We see the system become >> >> > unresponsive for minutes before it eventually OOMs. We also see very >> >> > slow browser tab switching under low memory. Instead of an unresponsive >> >> > system, we'd really like the kernel to OOM as soon as it starts to >> >> > thrash. If it can't keep the working set in memory, then OOM. >> >> > Losing one of many tabs is a better behaviour for the user than an >> >> > unresponsive system. >> >> > >> >> > This patch create a new sysctl, min_filelist_kbytes, which disables reclaim >> >> > of file-backed pages when when there are less than min_filelist_bytes worth >> >> > of such pages in the cache. This tunable is handy for low memory systems >> >> > using solid-state storage where interactive response is more important >> >> > than not OOMing. >> >> > >> >> > With this patch and min_filelist_kbytes set to 50000, I see very little >> >> > block layer activity during low memory. The system stays responsive under >> >> > low memory and browser tab switching is fast. Eventually, a process a gets >> >> > killed by OOM. Without this patch, the system gets wedged for minutes >> >> > before it eventually OOMs. Below is the vmstat output from my test runs. >> >> > >> >> > BEFORE (notice the high bi and wa, also how long it takes to OOM): >> >> >> >> That's an interesting result. >> >> >> >> Having the machine "wedged for minutes" thrashing away paging >> >> executable text is pretty bad behaviour. I wonder how to fix it. >> >> Perhaps simply declaring oom at an earlier stage. >> >> >> >> Your patch is certainly simple enough but a bit sad. It says "the VM >> >> gets this wrong, so lets just disable it all". And thereby reduces the >> >> motivation to fix it for real. >> >> >> > >> > Yeah, I used the RFC label because we're thinking this is just a temporary >> > bandaid until something better comes along. >> > >> > Couple of other nits I have with our patch: >> > * Not really sure what to do for the cgroup case. We do something >> > reasonable for now. >> > * One of my colleagues also brought up the point that we might want to do >> > something different if swap was enabled. >> > >> >> But the patch definitely improves the situation in real-world >> >> situations and there's a case to be made that it should be available at >> >> least as an interim thing until the VM gets fixed for real. Which >> >> means that the /proc tunable might disappear again (or become a no-op) >> >> some time in the future. >> >> I think this feature that "System response time doesn't allow but OOM allow". >> While we can control process to not killed by OOM using >> /oom_score_adj, we can't control response time directly. >> But in mobile system, we have to control response time. One of cause >> to avoid swap is due to response time. >> >> How about using memcg? >> Isolate processes related to system response(ex, rendering engine, IPC >> engine and so no) to another group. >> > Yes, this seems interesting topic on memcg. > > maybe configure cgroups as.. > > /system ....... limit to X % of the system. > /application ....... limit to 100-X % of the system. > > and put management software to /system. Then, the system software can check > behavior of applicatoin and measure cpu time and I/O performance in /applicaiton. > (And yes, it can watch memory usage.) > > Here, memory cgroup has oom-notifier, you may able to do something other than > oom-killer by the system. If this patch is applied to global VM, I'll check > memcg can support it or not. > Hmm....checking anon/file rate in /application may be enough ? I think anon/file/mapped_file is enough to do that. > > Or, as a google guy proosed, we may have to add "file-cache-only" memcg. > For example, configure system as > > /system > /application-anon > /application-file-cache > > (But balancing file/anon must be done by user....this is difficult.) Yes. I believe such fine-grained control can make system admin more annoying. > > BTW, can we know that "recently paged out file cache comes back immediately!" > score ? Not easy. If we can get it easily, we can enhance victim selection algorithm. AFAIR, Rik tried it. http://lwn.net/Articles/147879/ -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href