On Fri, Apr 18, 2014 at 09:41:33PM +0200, Petr Tesarik wrote: > On Fri, 18 Apr 2014 22:29:12 +0800 > "bhe at redhat.com" <bhe at redhat.com> wrote: > > > > > > >> It definitely will cause OOM. On my test machine, it has 100G memory. So > > > >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free > > > >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So > > > >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other > > > >> use, e.g page cache, dynamic allocation. OOM will happen. > > > >> > > > > > > > >BTW, in our case, there's about 30M free memory when we started saving > > > >dump. It should be caused by my coarse estimation above. > > > > > > Thanks for your description, I understand that situation and > > > the nature of the problem. > > > > > > That is, the assumption that 20% of free memory is enough for > > > makedumpfile can be broken if free memory is too small. > > > If your machine has 200GB memory, OOM will happen even after fix > > > the too allocation bug. > > > > Well, we have done some experiments to try to get the statistical memory > > range which kdump really need. Then a final reservation will be > > calculated automatically as (base_value + linear growth of total memory). > > If one machine has 200GB memory, its reservation will grow too. Since > > except of the bitmap cost, other memory cost is almost fixed. > > > > Per this scheme things should be go well, if memory always goes to the > > edge of OOM, an adjust of base_value is needed. So a constant value as > > you said may not be needed. > > > > Instead, I am wondering how the 80% comes from, and why 20% of free > > memory must be safe. > > I believe these 80% come from the default value of vm.dirty_ratio, Actually I had suggested this 80% number when --cyclic feature was implemented. And I did not base it on dirty_ratio. Just a random suggestion. > which is 20%. In other words, the kernel won't block further writes > until 20% of available RAM is used up by dirty cache. But if you > fill up all free memory with dirty pages and then touch another (though > allocated) page, the kernel will go into direct reclaim, and if nothing > can be written out ATM, it will invoke the OOM Killer. We can start playig with reducing dirty_raio too and see how does it go. Thanks Vivek