Re: [PATCH v5 00/10] KVM: arm64: Add support for hypercall services selection

Gavin Shan <gshan@xxxxxxxxxx> · Mon, 18 Apr 2022 10:53:59 +0800

Hi Marc,

On 4/15/22 4:58 PM, Marc Zyngier wrote:
On Fri, 15 Apr 2022 07:44:55 +0100,
Gavin Shan <gshan@xxxxxxxxxx> wrote:
On 4/7/22 9:15 AM, Raghavendra Rao Ananta wrote:
Continuing the discussion from [1], the series tries to add support
for the userspace to elect the hypercall services that it wishes
to expose to the guest, rather than the guest discovering them
unconditionally. The idea employed by the series was taken from
[1] as suggested by Marc Z.

In a broad sense, the concept is similar to the current implementation
of PSCI interface- create a 'firmware psuedo-register' to handle the
firmware revisions. The series extends this idea to all the other
hypercalls such as TRNG (True Random Number Generator), PV_TIME
(Paravirtualized Time), and PTP (Precision Time protocol).

For better categorization and future scaling, these firmware registers
are categorized based on the service call owners. Also, unlike the
existing firmware psuedo-registers, they hold the features supported
in the form of a bitmap.

During the VM initialization, the registers holds an upper-limit of
the features supported by each one of them. It's expected that the
userspace discover the features provided by each register via GET_ONE_REG,
and writeback the desired values using SET_ONE_REG. KVM allows this
modification only until the VM has started.

Some of the standard function-ids, such as ARM_SMCCC_VERSION_FUNC_ID,
need not be associated with a feature bit. For such ids, the series
introduced an allowed-list, hvc_func_default_allowed_list[], that holds
all such ids. As a result, the functions that are not elected by userspace,
or if they are not a part of this allowed-list, will be denied for when
the guests invoke them.

Older VMMs can simply ignore this interface and the hypercall services
will be exposed unconditionally to the guests, thus ensuring backward
compatibility.

[...]

I rethinking about the design again and just get one question. Hopefully,
someone have the answer for us. The newly added 3 pseudo registers and
the existing ones like KVM_REG_ARM_PSCI_VERSION are all tied up with
vcpu, instead of VM. I don't think it's correct. I'm not sure if VM-scoped
pseudo registers aren't allowed by ARM architecture or the effort isn't
worthy to support it.

We have had that discussion before (around version 2 of this series,
if I remember well).

Yeah, I'm chime-in this series lately. There must be some discussions,
including this topic, I missed :)

These pseudo registers are introduced to present the available hypercalls,
and then they can be disabled from userspace. In the implementation, these 3
registers are vcpu scoped. It means that multiple vcpus can be asymmetric
in terms of usable hypercalls. For example, ARM_SMCCC_TRNG hypercalls
can be enabled on vcpu0, but disabled on vcpu1. I don't think it's expected.

No, that's not the way this is supposed to work. These hypercalls are
of course global, even if the accessor is per-vcpu. This is similar to
tons of other things, such as some of the PMU data, the timer virtual
offset... the list goes on. If that's not what this code does, then it
is a bug and it needs to be fixed.

Ok.

On the other hand, the information stored in these 3 registers needs to
be migrated through {GET,SET}_ONE_REG by VMM (QEMU). all the information
stored in these 3 registers are all same on all vcpus, which is exactly
as we expect. In migration circumstance, we're transporting identical
information for all vcpus and it's unnecessary.

Yes, we all understand that. My response to that was (and still is):

- There is no need to invent a new userspace interface. The one we
   have is terrible enough, and we don't need another square wheel that
   would need to be maintained beside the existing one.

- Let's say we have 1024 new pseudo-registers, 1024 vcpus, 64bit regs:
   that's 8MB worth of extra data. This is not insignificant, but also
   not really a problem given that such a large VM is probably attached
   to a proportionally large amount of memory. In practice, we're
   talking of less than 10 registers, and less than 100 vcpus. A crazy
   8kB at most. Who cares?

- If this is eventually deemed to be a *real* scalability problem, we
   can always expose a map of registers that are global, and let
   userspace know that it can elide the rest. Problem solved, backward
   compatibility preserved. And I'm willing to bet that we won't need
   it in my lifetime.

The reason why I raised question is just to check if it's a missed
point in the design. As I said, I obviously missed the previous
discussions and glad that this has been discussed through.

Thanks for the details. Yes, it's totally fine to migrate 8KB data.
Besides, VMM (QEMU) can choose to do migration on one single vcpu,
instead of all of them, as you said.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm