Hi Peter,
On 2020/11/19 2:07, Peter Zijlstra wrote:
On Thu, Nov 19, 2020 at 12:15:09AM +0800, Like Xu wrote:
ISTR there was lots of fail trying to virtualize it earlier. What's
changed? There's 0 clues here.
Ah, now we have EPT-friendly PEBS facilities supported since Ice Lake
which makes guest PEBS feature possible w/o guest memory pinned.
OK.
Why are the host and guest DS area separate, why can't we map them to
the exact same physical pages?
If we map both guest and host DS_AREA to the exact same physical pages,
- the guest can access the host PEBS records, which means that the host
IP maybe leaked, because we cannot predict the time guest drains records and
it would be over-designed to clean it up before each vm-entry;
- different tasks/vcpus on the same pcpu cannot share the same PEBS DS
settings from the same physical page. For example, some require large
PEBS and reset values, while others do not.
Like many guest msrs, we use the separate guest DS_AREA for the guest's
own use and it avoids mutual interference as little as possible.
OK, but the code here wanted to inspect the guest DS from the host. It
states this is somehow complicated/expensive. But surely we can at the
very least map the first guest DS page somewhere so we can at least
access the control bits without too much magic.
We note that the SDM has a contiguous present memory mapping
assumption about the DS save area and the PEBS buffer area.
Therefore, we revisit your suggestion here and move it a bit forward:
When the PEBS is enabled, KVM will cache the following values:
- gva ds_area (kvm msr trap)
- hva1 for "gva ds_area" (walk guest page table)
- hva2 for "gva pebs_buffer_base" via hva1 (walk guest page table)
if the "gva ds_area" cache hits,
- access PEBS "interrupt threshold" and "Counter Reset[]" via hva1
- get "gva2 pebs_buffer_base" via __copy_from_user(hva1)
if the "gva2 pebs_buffer_base" cache hits,
- we get "gva2 pebs_index" via __copy_from_user(hva2),
- rewrite the guest PEBS records via hva2 and pebs_index
If any cache misses, setup the cache values via walking tables again.
I wonder if you would agree with this optimization idea,
we look forward to your confirmation for the next step.
Thanks,
Like Xu