From: Atsushi Kumagai <kumagai-atsushi@xxxxxxxxxxxxxxxxx> Subject: Re: makedumpfile memory usage grows with system memory size Date: Mon, 2 Apr 2012 16:46:51 +0900 > On Fri, 30 Mar 2012 09:51:43 +0900 ( ) > HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote: >> For performance impact, I don't know that exactly. But I guess >> iterating filtering processing is most significant. I don't know exact >> data structure for each kind of memory, but if there's the ones >> needing linear order to look up the data for a given page frame >> number, there would be necessary to add some special handling not to >> reduce performance. > > Thank you for your idea. > > I think this is an important issue and I have no idea except iterating > filtering processes for each memory range. > > But as you said, we should consider the issue related to performance. > For example, makedumpfile must parse free_list repeatedly to distinguish > whether each pfn is a free page or not, because each range may be inside > the same zone. It will be overhead. > Hello Kumagai-san, I looked into contents of free_list and confirmed that even buddies with the same order are not ordered linearly. The below is the output of makedumpfile I customized so it outputs buddy data. # ./makedumpfile --message-level 32 -c -d 31 /media/127.0.0.1-2012-04-04-20:31:58/vmcore vmcore-cd31 NR_ZONE: 0 order: 10 migrate_type: 2 pfn: 3072 order: 10 migrate_type: 2 pfn: 2048 order: 10 migrate_type: 2 pfn: 1024 order: 9 migrate_type: 3 pfn: 512 order: 8 migrate_type: 0 pfn: 256 order: 6 migrate_type: 0 pfn: 64 order: 5 migrate_type: 0 pfn: 32 order: 4 migrate_type: 0 pfn: 128 order: 4 migrate_type: 0 pfn: 16 order: 2 migrate_type: 0 pfn: 144 order: 1 migrate_type: 0 pfn: 148 NR_ZONE: 1 order: 10 migrate_type: 2 pfn: 226304 order: 10 migrate_type: 2 pfn: 225280 order: 10 migrate_type: 2 pfn: 486400 order: 10 migrate_type: 2 pfn: 485376 order: 10 migrate_type: 2 pfn: 484352 order: 10 migrate_type: 2 pfn: 483328 order: 10 migrate_type: 2 pfn: 482304 order: 10 migrate_type: 2 pfn: 481280 <snip> We cannot choose the way of simply walking free_list in the increasing order w.r.t. pfn for a given range of memory, suspend the walking and save the data for the next walking... So, it's necessary to create a table for access in constant time. But for that, the table needs to be created on the memory. On the 2nd kernel, we cannot assume any backing store in general: consider scp for example. I think basic idea would be several efforts for small memory programming, like: * Create part of bitmap corresponding to range of memory currently being processed only, and table creation processing is repeated each time range of memory is started. => difficult to avoid looking up a whole part of free_list every time, but this is only idea I come up with that makes it always possible that consumed memory is stably constant. * Have table in memory mapping form rather than bitmap, switch back to bitmap if the size gets larger than the bitmap's => bad performance on very fragmented case, and constructing memory mapping requires O(n^2) so would cost high if doing it multiple times. * Compress part of bitmap except for the one currently being processed => bad performance when compression doesn't work well bad performance when compression is done too many times But before that, I want to also consider possibility of increasing reserved memory for the 2nd kernel. On the discussion of 512MB reservation regression last month, Vivek explained that 512MB is current maximam value and enough for at most 6TB system. https://lkml.org/lkml/2012/3/13/372 But on such machine, where makedumpfile perforamce is affected, there seems to be a room to reserve more 512MB memory. Also Yinghai said following Vivek, system memory size will still grow in next years. Note: * 1 bit in bitmap represents 1 page frame. On x86, 1 byte is for 32kB memory. 1TB memory requres 32MB. Dump includes two bitmaps so 64MB is needed in total. * Bad performance is free pages only. Cache, cache private, user and zero pages are processed per range of memory in good performance. Thanks. HATAYAMA, Daisuke