[PATCH v2 2/9] KVM: Add documentation for VCPU requests

Andrew Jones <drjones@xxxxxxxxxx> · Fri, 31 Mar 2017 18:06:51 +0200

Signed-off-by: Andrew Jones <drjones@xxxxxxxxxx>
---
 Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)
 create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst

diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
new file mode 100644
index 000000000000..ea4a966d5c8a
--- /dev/null
+++ b/Documentation/virtual/kvm/vcpu-requests.rst
@@ -0,0 +1,114 @@
+=================
+KVM VCPU Requests
+=================
+
+Overview
+========
+
+KVM supports an internal API enabling threads to request a VCPU thread to
+perform some activity.  For example, a thread may request a VCPU to flush
+its TLB with a VCPU request.  The API consists of only four calls::
+
+  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
+  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
+
+  /* Check if any requests are pending for VCPU @vcpu. */
+  bool kvm_request_pending(struct kvm_vcpu *vcpu);
+
+  /* Make request @req of VCPU @vcpu. */
+  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
+
+  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
+  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
+
+Typically a requester wants the VCPU to perform the activity as soon
+as possible after making the request.  This means most requests,
+kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
+and kvm_make_all_cpus_request() has the kicking of all VCPUs built
+into it.
+
+VCPU Kicks
+----------
+
+A VCPU kick does one of three things:
+
+ 1) wakes a sleeping VCPU (which sleeps outside guest mode).
+ 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
+    out.
+ 3) nothing, when the VCPU is already outside guest mode and not sleeping.
+
+VCPU Request Internals
+======================
+
+VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
+means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
+may also be used.  The first 8 bits are reserved for architecture
+independent requests, all additional bits are available for architecture
+dependent requests.
+
+VCPU Requests with Associated State
+===================================
+
+Requesters that want the requested VCPU to handle new state need to ensure
+the state is observable to the requested VCPU thread's CPU at the time the
+CPU observes the request.  This means a write memory barrier should be
+insert between the preparation of the state and the write of the VCPU
+request bitmap.  Additionally, on the requested VCPU thread's side, a
+corresponding read barrier should be issued after reading the request bit
+and before proceeding to use the state associated with it.  See the kernel
+memory barrier documentation [2].
+
+VCPU Requests and Guest Mode
+============================
+
+As long as the guest is either in guest mode, in which case it gets an IPI
+and will definitely see the request, or is outside guest mode, but has yet
+to do its final request check, and therefore when it does, it will see the
+request, then things will work.  However, the transition from outside to
+inside guest mode, after the last request check has been made, opens a
+window where a request could be made, but the VCPU would not see until it
+exits guest mode some time later.  See the table below.
+
++------------------+-----------------+----------------+--------------+
+| vcpu->mode       | done last check | kick sends IPI | request seen |
++==================+=================+================+==============+
+| IN_GUEST_MODE    |      N/A        |      YES       |     YES      |
++------------------+-----------------+----------------+--------------+
+| !IN_GUEST_MODE   |      NO         |      NO        |     YES      |
++------------------+-----------------+----------------+--------------+
+| !IN_GUEST_MODE   |      YES        |      NO        |     NO       |
++------------------+-----------------+----------------+--------------+
+
+To ensure the third scenario shown in the table above cannot happen, we
+need to ensure the VCPU's mode change is observable by all CPUs prior to
+its final request check and that a requester's request is observable by
+the requested VCPU prior to the kick.  To do that we need general memory
+barriers between each pair of operations involving mode and requests, i.e.
+
+  CPU_i                                  CPU_j
+-------------------------------------------------------------------------
+  vcpu->mode = IN_GUEST_MODE;            kvm_make_request(REQ, vcpu);
+  smp_mb();                              smp_mb();
+  if (kvm_request_pending(vcpu))         if (vcpu->mode == IN_GUEST_MODE)
+      handle_requests();                     send_IPI(vcpu->cpu);
+
+Whether explicit barriers are needed, or reliance on implicit barriers is
+sufficient, is architecture dependent.  Alternatively, an architecture may
+choose to just always send the IPI, as not sending it, when it's not
+necessary, is just an optimization.
+
+Additionally, the error prone third scenario described above also exhibits
+why a request-less VCPU kick is almost never correct.  Without the
+assurance that a non-IPI generating kick will still result in an action by
+the requested VCPU, as the final kvm_request_pending() check does, then
+the kick may not initiate anything useful at all.  If, for instance, a
+request-less kick was made to a VCPU that was just about to set its mode
+to IN_GUEST_MODE, meaning no IPI is sent, then the VCPU may continue its
+entry without actually having done whatever it was the kick was meant to
+initiate.
+
+References
+==========
+
+[1] Documentation/core-api/atomic_ops.rst
+[2] Documentation/memory-barriers.txt
-- 
2.9.3