On 2024/3/6 上午2:26, WANG Xuerui wrote:
On 3/4/24 17:10, maobibo wrote:
On 2024/3/2 下午5:41, WANG Xuerui wrote:
On 3/2/24 16:47, Bibo Mao wrote:
[snip]
+Querying for existence
+======================
+
+To find out if we're running on KVM or not, cpucfg can be used with
index
+CPUCFG_KVM_BASE (0x40000000), cpucfg range between 0x40000000 -
0x400000FF
+is marked as a specially reserved range. All existing and future
processors
+will not implement any features in this range.
+
+When Linux is running on KVM, cpucfg with index CPUCFG_KVM_BASE
(0x40000000)
+returns magic string "KVM\0"
+
+Once you determined you're running under a PV capable KVM, you can
now use
+hypercalls as described below.
So this is still the approach similar to the x86 CPUID-based
implementation. But here the non-privileged behavior isn't specified
-- I see there is PLV checking in Patch 3 but it's safer to have the
requirement spelled out here too.
But I still think this approach touches more places than strictly
needed. As it is currently the case in
arch/loongarch/kernel/cpu-probe.c, the FEATURES IOCSR is checked for
a bit IOCSRF_VM that already signifies presence of a hypervisor; if
this information can be interpreted as availability of the HVCL
instruction (which I suppose is the case -- a hypervisor can always
trap-and-emulate in case HVCL isn't provided by hardware), here we
can already start making calls with HVCL.
We can and should define a uniform interface for probing the
hypervisor kind, similar to the centrally-managed RISC-V SBI
implementation ID registry [1]: otherwise future non-KVM hypervisors
would have to
1. somehow pretend they are KVM and eventually fail to do so, leading
to subtle incompatibilities,
2. invent another way of probing for their existence,
3. piggy-back on the current KVM definition, which is inelegant
(reading the LoongArch-KVM-defined CPUCFG leaf only to find it's not
KVM) and utterly makes the definition here *not* KVM-specific.
[1]:
https://github.com/riscv-non-isa/riscv-sbi-doc/blob/v2.0/src/ext-base.adoc
Sorry, I know nothing about riscv. Can you describe how
sbi_get_mimpid() is implemented in detailed? Is it a simple library or
need trap into secure mode or need trap into hypervisor mode?
For these simple interfaces you can expect trivial implementation. See
for example [OpenSBI]'s respective code.
[OpenSBI]:
https://github.com/riscv-software-src/opensbi/blob/v1.4/lib/sbi/sbi_ecall.c#L29-L34
My take on this:
To check if we are running on Linux KVM or not, first check IOCSR 0x8
(``LOONGARCH_IOCSR_FEATURES``) for bit 11 (``IOCSRF_VM``); we are
running under a hypervisor if the bit is set. Then invoke ``HVCL 0``
to find out the hypervisor implementation ID; a return value in
``$a0`` of 0x004d564b (``KVM\0``) means Linux KVM, in which case the
rest of the convention applies.
I do not think so. `HVCL 0` requires that hypercall ABIs need be
unified for all hypervisors. Instead it is not necessary, each
hypervisor can has its own hypercall ABI.
I don't think agreeing upon the ABI of HVCL 0 is going to affect ABI of
other hypercalls. Plus, as long as people don't invent something that
they think is smart and deviate from the platform calling convention,
I'd expect every hypervisor to have identical ABI apart from the exact
HVCL operation ID chosen.
+
+KVM hypercall ABI
+=================
+
+Hypercall ABI on KVM is simple, only one scratch register a0 (v0)
and at most
+five generic registers used as input parameter. FP register and
vector register
+is not used for input register and should not be modified during
hypercall.
+Hypercall function can be inlined since there is only one scratch
register.
It should be pointed out explicitly that on hypercall return all
Well, return value description will added. What do think about the
meaning of return value for KVM_HCALL_FUNC_PV_IPI hypercall? The
number of CPUs with IPI delivered successfully like kvm x86 or simply
success/failure?
architectural state except ``$a0`` is preserved. Or is the whole
``$a0 - $t8`` range clobbered, just like with Linux syscalls?
what is advantage with $a0 - > $t8 clobbered?
Because then a hypercall is going to behave identical as an ordinary C
function call, which is easy for people and compilers to understand.
If you really understand detailed behavior about hypercall/syscall, the
conclusion may be different.
If T0 - T8 is clobbered with hypercall instruction, hypercall caller
need save clobbered register, now hypercall exception save/restore all
the registers during VM exits. If so, hypercall caller need not save
general registers and it is not necessary scratched for hypercall ABI.
Until now all the discussion the macro level, no detail code level.
Can you show me some example code where T0-T8 need not save/restore
during LoongArch hypercall exception?
Regards
Bibo Mao
It seems that with linux Loongarch syscall, t0--t8 are clobber rather
than a0-t8. Am I wrong?
You're right, my memory has faded a bit. But I think my reasoning still
holds.
+
+The parameters are as follows:
+
+ ======== ================ ================
+ Register IN OUT
+ ======== ================ ================
+ a0 function number Return code
+ a1 1st parameter -
+ a2 2nd parameter -
+ a3 3rd parameter -
+ a4 4th parameter -
+ a5 5th parameter -
+ ======== ================ ================
+
+Return codes can be as follows:
+
+ ==== =========================
+ Code Meaning
+ ==== =========================
+ 0 Success
+ -1 Hypercall not implemented
+ -2 Hypercall parameter error
What about re-using well-known errno's, like -ENOSYS for "hypercall
not implemented" and -EINVAL for "invalid parameter"? This could save
people some hair when more error codes are added in the future.
No, I do not think so. Here is hypercall return value, some OS need
see it. -ENOSYS/-EINVAL may be not understandable for non-Linux OS.
As long as you accept the associated costs (documentation, potential
mapping back-and-forth, proper conveyance of information etc.) I have no
problem with that either.
+ ==== =========================
+
+KVM Hypercalls Documentation
+============================
+
+The template for each hypercall is:
+1. Hypercall name
+2. Purpose
+
+1. KVM_HCALL_FUNC_PV_IPI
+------------------------
+
+:Purpose: Send IPIs to multiple vCPUs.
+
+- a0: KVM_HCALL_FUNC_PV_IPI
+- a1: lower part of the bitmap of destination physical CPUIDs
+- a2: higher part of the bitmap of destination physical CPUIDs
+- a3: the lowest physical CPUID in bitmap
"CPU ID", instead of "CPUID" for clarity: I suppose most people
reading this also know about x86, so "CPUID" could evoke the wrong
intuition.
Both "CPU core id" or "CPUID" are ok for me since there is csr
register named LOONGARCH_CSR_CPUID already.
I was suggesting to minimize confusion even at theoretical level,
because you cannot assume anything about your readers. Feel free to
provide extra info (e.g. the "CPU core ID" you suggested) as long as it
helps to resolve any potential ambiguity / confusion.
This function is equivalent to the C signature "void hypcall(int
func, u128 mask, int lowest_cpu_id)", which I think is fine, but one
can also see that the return value description is missing.
Sure, the return value description will added.
And it is not equivalent to the C signature "void hypcall(int func,
u128 mask, int lowest_cpu_id)". int/u128/stucture is not permitted
with hypercall ABI, all parameter is "unsigned long".
I was talking about the ABI in a C perspective, and the register usage
is identical. You can define the KVM hypercall ABI however you want but
having some nice analogy/equivalence would help a lot, especially for
people not already familiar with all the details.