On Fri, Aug 10, 2012 at 01:33:04PM +0300, Kirill A. Shutemov wrote: >On Fri, Aug 10, 2012 at 11:49:12AM +0800, Wanpeng Li wrote: >> On Thu, Aug 09, 2012 at 12:08:11PM +0300, Kirill A. Shutemov wrote: >> >From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> >> > >> >During testing I noticed big (up to 2.5 times) memory consumption overhead >> >on some workloads (e.g. ft.A from NPB) if THP is enabled. >> > >> >The main reason for that big difference is lacking zero page in THP case. >> >We have to allocate a real page on read page fault. >> > >> >A program to demonstrate the issue: >> >#include <assert.h> >> >#include <stdlib.h> >> >#include <unistd.h> >> > >> >#define MB 1024*1024 >> > >> >int main(int argc, char **argv) >> >{ >> > char *p; >> > int i; >> > >> > posix_memalign((void **)&p, 2 * MB, 200 * MB); >> > for (i = 0; i < 200 * MB; i+= 4096) >> > assert(p[i] == 0); >> > pause(); >> > return 0; >> >} >> > >> >With thp-never RSS is about 400k, but with thp-always it's 200M. >> >After the patcheset thp-always RSS is 400k too. >> > >> Hi Kirill, >> >> Thank you for your patchset, I have some questions to ask. >> >> 1. In your patchset, if read page fault, the pmd will be populated by huge >> zero page, IIUC, assert(p[i] == 0) is a read operation, so why thp-always >> RSS is 400K ? You allocate 100 pages, why each cost 4K? I think the >> right overhead should be 2MB for the huge zero page instead of 400K, where >> I missing ? > >400k comes not from the allocation, but from libc runtime. The test >program consumes about the same without any allocation at all. > >Zero page is a global resource. System owns it. It's not accounted to any >process. > >> >> 2. If the user hope to allocate 200MB, total 100 pages needed. The codes >> will allocate one 2MB huge zero page and populate to all associated pmd >> in your patchset logic. When the user attempt to write pages, wp will be >> triggered, and if allocate huge page failed will fallback to >> do_huge_pmd_wp_zero_page_fallback in your patch logic, but you just >> create a new table and set pte around fault address to the newly >> allocated page, all other ptes set to normal zero page. In this scene >> user only get one 4K page and all other zero pages, how the codes can >> cotinue to work? Why not fallback to allocate normal page even if not >> physical continuous. > >Since we allocate 4k page around the fault address the fault is handled. >Userspace can use it. > >If the process will try to write to any other 4k page of this area a new >fault will be triggered and do_wp_page() will allocate a real page. > >It's not reasonable to allocate all 4k pages in the fallback path. We can >postpone it until userspace will really want to use them. This way we reduce >memory pressure in fallback path. Oh, I see. Thanks for your response and your good work. :) Regards, Wanpeng Li > >> 3. In your patchset logic: >> "In fallback path we create a new table and set pte around fault address >> to the newly allocated page. All other ptes set to normal zero page." >> When these zero pages will be replaced by real pages and add memcg charge? > >I guess I've answered the question above. > >> Look forward to your detail response, thank you! :) > >Thanks for your questions. > >-- > Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>