[Apologies for the delay in getting back to you; other work ended up taking priority and now I'm back to looking at this.] On Tue, Aug 18, 2020 at 11:25 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote: > > On 8/17/20 8:00 PM, Matthew Wilcox wrote: > > On Mon, Aug 17, 2020 at 07:31:39PM -0700, John Hubbard wrote: > >>> Real time (s) Max RSS (KiB) > >>> anon 2.237081 107088 > >>> memset 2.252241 112180 > >>> refpage 2.243786 107128 > >>> > >>> We can see that RSS for refpage is almost the same as anon, and real > >>> time overhead is 44% that of memset. > >>> > >> > >> Are some of the numbers stale, maybe? Try as I might, I cannot combine > >> anything above to come up with 44%. :) > > > > You're not trying hard enough ;-) > > > > (2.252241 - 2.237081) / 2.237081 = .00677668801442594166 > > (2.243786 - 2.237081) / 2.237081 = .00299720930981041812 > > .00299720930981041812 / .00677668801442594166 = .44228232189973614648 > > > > tadaa! > > haha, OK then! :) Next time I may try harder, but on the other hand my > interpretation of the results is still "this is a small effect", even > if there is a way to make it sound large by comparing the 3rd significant > digits of the results... > > > > > As I said last time this was posted, I'm just not excited by this. We go > > from having a 0.68% time overhead down to an 0.30% overhead, which just > > doesn't move the needle for me. Maybe there's a better benchmark than > > this to show benefits from this patchset. > > > Remember that this is a "realistic" benchmark, so it's doing plenty of other work besides faulting pages. So I don't think we should expect to see a massive improvement here. I ran the pdfium benchmark again but I couldn't see the same improvements that I got last time. This seems to be because pdfium has since switched to its own allocator, bypassing the system allocator. I think the gains should be larger with the memset optimization that I've implemented, but I'm still in the process of finding a suitable realistic benchmark that uses the system allocator. But I would find a 0.4% perf improvement convincing enough, personally, given that the workload is realistic. Consider a certain large company which spends $billions annually on data centers. In that environment a 0.4% performance improvement on realistic workloads can translate to $millions of savings. And that's not taking into account the memory savings which are important both in mobile environments and in data centers. > Yes, I wonder if there is an artificial workload that just uses refpages > really extensively, maybe we can get some good solid improvements shown > with that? Otherwise, it seems like we've just learned that memset is > actually pretty good in this case. :) Yes, it's possible to see the performance improvement here more clearly with a microbenchmark. I've updated the commit message in v4 to include a microbenchmark program and some performance numbers from it. Peter