On 2020-10-14 12:56 p.m., Borislav Petkov wrote:
On Wed, Oct 14, 2020 at 01:32:55AM -0700, Ankur Arora wrote:
This can potentially improve page-clearing bandwidth (see below for
performance numbers for two microarchitectures where it helps and one
where it doesn't) and can help indirectly by consuming less cache
resources.
Any performance benefits are expected for extents larger than LLC-sized
or more -- when we are DRAM-BW constrained rather than cache-BW
constrained.
"potentially", "expected", I don't like those formulations.
That's fair. The reason for those weasel words is mostly because it
is microarchitecture specific.
For example on Intel where I did compare across generations: I see good
performance on Broadwellx, not good on Skylakex and then good again on
some pre-production CPUs.
Do you have
some actual benchmark data where this shows any improvement and not
microbenchmarks only, to warrant the additional complexity?
Yes, guest creation under QEMU (pinned guests) shows similar improvements.
I've posted performance numbers in patches 7, 8 with a simple page-fault
test derived from that.
I can add numbers from QEMU as well.
Thanks,
Ankur