On Thu, 13 Oct 2011 16:33:21 +0900 Minchan Kim <minchan.kim@xxxxxxxxx> wrote: > On Fri, Sep 02, 2011 at 12:31:14PM -0400, Satoru Moriya wrote: > > On 09/01/2011 05:58 PM, Andrew Morton wrote: > > > On Thu, 1 Sep 2011 15:26:50 -0400 > > > Rik van Riel <riel@xxxxxxxxxx> wrote: > > > > > >> Add a userspace visible knob > > > > > > argh. Fear and hostility at new knobs which need to be maintained for > > > ever, even if the underlying implementation changes. > > > > > > Unfortunately, this one makes sense. > > > > > >> to tell the VM to keep an extra amount of memory free, by increasing > > >> the gap between each zone's min and low watermarks. > > >> > > >> This is useful for realtime applications that call system calls and > > >> have a bound on the number of allocations that happen in any short > > >> time period. In this application, extra_free_kbytes would be left at > > >> an amount equal to or larger than the maximum number of > > >> allocations that happen in any burst. > > > > > > _is_ it useful? Proof? > > > > > > Who is requesting this? Have they tested it? Results? > > > > This is interesting for me. > > > > Some of our customers have realtime applications and they are concerned > > the fact that Linux uses free memory as pagecache. It means that > > when their application allocate memory, Linux kernel tries to reclaim > > memory at first and then allocate it. This may make memory allocation > > latency bigger. > > > > In many cases this is not a big issue because Linux has kswapd for > > background reclaim and it is fast enough not to enter direct reclaim > > path if there are a lot of clean cache. But under some situations - > > e.g. Application allocates a lot of memory which is larger than delta > > between watermark_low and watermark_min in a short time and kswapd > > can't reclaim fast enough due to dirty page reclaim, direct reclaim > > is executed and causes big latency. > > > > We can avoid the issue above by using preallocation and mlock. > > But it can't cover kmalloc used in systemcall. So I'd like to use > > this patch with mlock to avoid memory allocation latency issue as > > low as possible. It may not be a perfect solution but it is important > > for customers in enterprise area to configure the amount of free > > memory at their own risk. > > I agree needs for such feature but don't like such primitive interface > exporting to user. > > As Satoru said, we can reserve free pages for user through preallocation and mlocking. > The thing is free pages for kernel itself. > Most desirable thing is we have to avoid syscall in critical realtime section. > But if we can't avoid, my crazy idea is to use memcg for kernel pages. > Of course, we should implement it and not simple stuff but AFAIK, memcg people > always consider it and finally will do it. :) > Recently, Glauber try "Basic kernel memory functionality" but I don't have reviewed > it yet. I am not sure we can reuse it, anyway. Kame? > I reviewed it and it seems good. It adds kmem.limit_in_bytes then we're ready to go forward to kernel memory cgroup. But it adds only interfaces now. I think Greg Thelen <gthelen@xxxxxxxxxx> has some idea. > My simple idea is as follows, > > We can assign basic revered page pool and/or size of user-determined pages pool > for each task registred at memcg-slab. Hmm, memcg-mempool ? > The application have to notify start of RT section to memcg before it goes to > RT section. So, memcg could fill up page pool if it is short. In this case, > application can stuck but it's okay as it doesn't go to RT section yet. > The applicatoin have to notify end of RT section to memcg, too so that memcg > could try to fill up reserved page pool in case of shortage. > That 'notification' doesn't sounds good to me. When application died/moved to other group without notification, memcg will be unstable. It should be task's state rather than memcg's state. > Why we need such notification is kswapd high prioiry, new knob and others never > can meet application's deadline requirement in some situations(ex, > there are so many dirty pages in LRU or fill up anon pages in non-swap case and so on) > so that application might end up stuck at some point. The somepoint must be out of RT > section of the task. > > For implemenation, we might need new watermark setting for each memcg or/and > kswapd prioirity promotion like thing for hurry reclaiming. > Anyway, they are just implementaions and we could enhance/add further more through > various techniques as time goes by. > > Personally, I think it could a valuable featue. > Hmm. For avoid latency at allocation, what we can do is only pre-allocation before it's required. But the problem is that applications cannot forecast when the 'burst' allocation happens and we need to prepare memory pool always. I think we need 2 implemenations. 1. free-page mempool for a memcg. 2. a background reclaim thread for a memcg. This is triggered by mempool. Prioritity of this thread should be able to controlled by some ways. If we take care of memcg's limit, watermark should trigger background reclaim. ? But the memory reclaim routine should never be in sleep... Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>