Hello HATAYAMA-san, On Mon, 14 May 2012 14:44:28 +0900 (JST) HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote: > From: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp> > Subject: Re: makedumpfile memory usage grows with system memory size > Date: Fri, 27 Apr 2012 16:46:49 +0900 > > > - Now, the prototype doesn't support PG_buddy because the value of PG_buddy > > is different depending on kernel configuration and it isn't stored into > > VMCOREINFO. However, I'll extend get_length_of_free_pages() for PG_buddy > > when the value of PG_buddy is stored into VMCOREINFO. > > Hello Kumagai san, > > I'm now investigating how to perform filtering free pages without > kernel debuginfo. For this, I've investigated which of PG_buddy and > _mapcount to use in kernel versions. In the current conclusion, it's > reasonable to do that as shown in the following table. > > | kernel version | Use PG_buddy? or _mapcount? | > |------------------+----------------------------------------------------------| > | 2.6.15 -- 2.6.16 | offsetof(page,_mapcount):=sizeof(ulong)+sizeof(atomic_t) | > | 2.6.17 -- 2.6.26 | PG_buddy := 19 | > | 2.6.27 -- 2.6.36 | PG_buddy := 18 | > | 2.6.37 and later | offsetof(page,_mapcount):= under investigation | | Thank you for your investigation, it's very helpful ! > In summary: PG_buddy was first introduced at 2.6.17 as 19 to fix some > race bug leading to lru list corruptions, and from 2.6.17 to 2.6.26, > it had been defined using macro preprocessor. At 2.6.27 enum pageflags > was introduced for ease of page flags maintainance and its value > changed to 18. At 2.6.37, it was removed, and it no longer exists in > later kernel versions. > > My quick feeling is that solving dependency of PG_buddy is simler than > that of _mapcount from 2.6.17 to 2.6.36. > > From 2.6.15 to 2.6.16, PG_buddy has not been introduced so we need to > rely on _mapcount. It's very complex to solve _mapcount dependency in > general on all supported kernel versions, but only on both kernel > versions, definition of struct page begins with the following > layout. I think it's not so much complex to hardcode offset of > _mapcount for these two kernel versions only: that is, sizeof(unsigned > long) + sizeof(atomic_t) which is in fact struct { volatile int > counter } on all platforms. > > struct page { > unsigned long flags; /* Atomic flags, some possibly > * updated asynchronously */ > atomic_t _count; /* Usage count, see below. */ > atomic_t _mapcount; /* Count of ptes mapped in mms, > ... > > In the period of PG_buddy is defined as enumeration value, PG_buddy > value depends on CONFIG_PAGEFLAGS_EXTENDED. At commit > e20b8cca760ed2a6abcfe37ef56f2306790db648, PG_head and PG_tail were > introduced and they are positioned before PG_buddy if > CONFIG_PAGEFLAGS_EXTENDED is set; then PG_buddy value becomes > 19. However, its users are mips, um and xtensa only as: > > $ git grep "CONFIG_PAGEFLAGS_EXTENDED" > arch/mips/configs/db1300_defconfig:CONFIG_PAGEFLAGS_EXTENDED=y > arch/um/defconfig:CONFIG_PAGEFLAGS_EXTENDED=y > arch/xtensa/configs/iss_defconfig:CONFIG_PAGEFLAGS_EXTENDED=y > arch/xtensa/configs/s6105_defconfig:CONFIG_PAGEFLAGS_EXTENDED=y > include/linux/page-flags.h:#ifdef CONFIG_PAGEFLAGS_EXTENDED > include/linux/page-flags.h:#ifdef CONFIG_PAGEFLAGS_EXTENDED > mm/memory-failure.c:#ifdef CONFIG_PAGEFLAGS_EXTENDED > mm/page_alloc.c:#ifdef CONFIG_PAGEFLAGS_EXTENDED > > and makedumpfile doesn't support any of these platforms now. So we > don't need to consider this case more. > > On 2.6.37 and the later kernels, we must use _mapcount. I'm now > looking into how to get offset of _mapcount in each kernel version > without kernel debug information. But page structure has changed > considerably on recent kernels so I guess the way hardcoding them gets > more complicated. > > Anyway, I think it better to add _mapcount information to VMCOREINFO > on upstream as soon as possible. I think it's better way to use _mapcount. But we don't certainly decide to use _mapcount and even if we decide to use it, we still have problems to use it. For example, the upstream kernel(v3.4-rc7) has _mapcount in union, we need a information to judge whether the found data is _mapcount or not. So, more investigation is needed and I think it's too early to send the request to upstream kernel. I plan to finish working to reduce memory consumption by the end of June, and I will continue to discuss performance issues. Therefore, the request will be delayed until July or August. Thanks Atsushi Kumagai