On Wed 31-05-17 23:35:48, Pasha Tatashin wrote: > >OK, so why cannot we make zero_struct_page 8x 8B stores, other arches > >would do memset. You said it would be slower but would that be > >measurable? I am sorry to be so persistent here but I would be really > >happier if this didn't depend on the deferred initialization. If this is > >absolutely a no-go then I can live with that of course. > > Hi Michal, > > This is actually a very good idea. I just did some measurements, and it > looks like performance is very good. > > Here is data from SPARC-M7 with 3312G memory with single thread performance: > > Current: > memset() in memblock allocator takes: 8.83s > __init_single_page() take: 8.63s > > Option 1: > memset() in __init_single_page() takes: 61.09s (as we discussed because of > membar overhead, memset should really be optimized to do STBI only when size > is 1 page or bigger). > > Option 2: > > 8 stores (stx) in __init_single_page(): 8.525s! > > So, even for single thread performance we can double the initialization > speed of "struct page" on SPARC by removing memset() from memblock, and > using 8 stx in __init_single_page(). It appears we never miss L1 in > __init_single_page() after the initial 8 stx. OK, that is good to hear and it actually matches my understanding that writes to a single cacheline should add an overhead. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html