Re: [v3 0/9] parallelized "struct page" zeroing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OK, so why cannot we make zero_struct_page 8x 8B stores, other arches
would do memset. You said it would be slower but would that be
measurable? I am sorry to be so persistent here but I would be really
happier if this didn't depend on the deferred initialization. If this is
absolutely a no-go then I can live with that of course.

Hi Michal,

This is actually a very good idea. I just did some measurements, and it looks like performance is very good.

Here is data from SPARC-M7 with 3312G memory with single thread performance:

Current:
memset() in memblock allocator takes: 8.83s
__init_single_page() take: 8.63s

Option 1:
memset() in __init_single_page() takes: 61.09s (as we discussed because of membar overhead, memset should really be optimized to do STBI only when size is 1 page or bigger).

Option 2:

8 stores (stx) in __init_single_page(): 8.525s!

So, even for single thread performance we can double the initialization speed of "struct page" on SPARC by removing memset() from memblock, and using 8 stx in __init_single_page(). It appears we never miss L1 in __init_single_page() after the initial 8 stx.

I will update patches with memset() on other platforms, and stx on SPARC.

My experimental code looks like this:

static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid)
{
        __asm__ __volatile__(
        "stx    %%g0, [%0 + 0x00]\n"
        "stx    %%g0, [%0 + 0x08]\n"
        "stx    %%g0, [%0 + 0x10]\n"
        "stx    %%g0, [%0 + 0x18]\n"
        "stx    %%g0, [%0 + 0x20]\n"
        "stx    %%g0, [%0 + 0x28]\n"
        "stx    %%g0, [%0 + 0x30]\n"
        "stx    %%g0, [%0 + 0x38]\n"
        :
        :"r"(page));
        set_page_links(page, zone, nid, pfn);
        init_page_count(page);
        page_mapcount_reset(page);
        page_cpupid_reset_last(page);

        INIT_LIST_HEAD(&page->lru);
#ifdef WANT_PAGE_VIRTUAL
        /* The shift won't overflow because ZONE_NORMAL is below 4G. */
        if (!is_highmem_idx(zone))
                set_page_address(page, __va(pfn << PAGE_SHIFT));
#endif
}

Thank you,
Pasha
--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux