From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> * remote.html.in: Remove obsolete notes about internals of the RPC protocol * internals/rpc.html.in: Extensive docs on RPC protocol/API * sitemap.html.in: Add new page --- docs/internals/rpc.html.in | 876 ++++++++++++++++++++++++++++++++++++++++++++ docs/remote.html.in | 45 --- docs/sitemap.html.in | 4 + 3 files changed, 880 insertions(+), 45 deletions(-) create mode 100644 docs/internals/rpc.html.in diff --git a/docs/internals/rpc.html.in b/docs/internals/rpc.html.in new file mode 100644 index 0000000..761832a --- /dev/null +++ b/docs/internals/rpc.html.in @@ -0,0 +1,876 @@ +<html> + <body> + <h1>libvirt RPC infrastructure</h1> + + <ul id="toc"></ul> + + <p> + libvirt includes a basic protocol and code to implement + an extensible, secure client/server RPC service. This was + originally designed for communication between the libvirt + client library and the libvirtd daemon. It is also also + used for communication to the virtlockd daemon and (soon) + for the libvirt_lxc controller process. This document + provides an overview of the protocol and structure / operation + of the internal RPC library APIs. + </p> + + + <h2><a name="protocol">RPC protocol</a></h2> + + <p> + libvirt uses a simple, variable length, packet based RPC protocol. + All structured data within packets is encoded using the + <a href="http://en.wikipedia.org/wiki/External_Data_Representation">XDR standard</a> + as currently defined by <a href="https://tools.ietf.org/html/rfc4506">RFC 4506</a>. + On any connection running the RPC protocol, there can be multiple + programs active, each supporting one or more versions. A program + defines a set of procedures that it supports. The procedures can + support call+reply method invocation, asynchronous events, + and generic data streams. Method invocations can be overlapped, + so waiting for a reply to one will not block the receipt of the + reply to another outstanding method. The protocol was loosely + inspired by the design of SunRPC. The definition of the RC + protocol is in the file <code>src/rpc/virnetprotocol.x</code> + in the libvirt source tree. + </p> + + <h3><a href="protocolframing">Packet framing</a></h3> + + <p> + On the wire, there is no explicit packet framing marker. Instead + each packet is preceeded by an unsigned 32-bit integer giving + the total length of the packet in bytes. This length includes + the 4-bytes of the length word itself. Conceptually the framing + looks like this: + </p> + +<pre> +|~~~ Packet 1 ~~~|~~~ Packet 2 ~~~|~~~ Packet 3 ~~~|~~~ + ++-------+------------+-------+------------+-------+------------+... +| n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 | ++-------+------------+-------+------------+-------+------------+... + +|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~ + +</pre> + + <h3><a href="protocoldata">Packet data</a></h3> + + <p> + The data in each packet is split into two parts, a short + fixed length header, followed by a variable length payload. + So a packet from the illustration above is more correctly + shown as + </p> + +<pre> + ++-------+-------------+---------------....---+ +| n=U32 | 6*U32 | (n-(7*4))*U8 | ++-------+-------------+---------------....---+ + +|~ Len ~|~ Header ~|~ Payload .... ~| +</pre> + + + <h3><a href="protocolheader">Packet header</a></h3> + <p> + The header contains 6 fields, encoded as signed/unsigned 32-bit + integers. + </p> + + <pre> ++---------------+ +| program=U32 | ++---------------+ +| version=U32 | ++---------------+ +| procedure=S32 | ++---------------+ +| type=S32 | ++---------------+ +| serial=U32 | ++---------------+ +| status=S32 | ++---------------+ + </pre> + + <dl> + <dt><code>program</code></dt> + <dd> + This is an arbitrarily chosen number that will uniquely + identify the "service" running over the stream. + </dd> + <dt><code>version</code></dt> + <dd> + This is the version number of the program, by convention + starting from '1'. When an incompatible change is made + to a program, the version number is incremented. Ideally + both versions will then be supported on the wire in + parallel for backwards compatibility. + </dd> + <dt><code>procedure</code></dt> + <dd> + This is an arbitrarily chosen number that will uniqely + identify the method call, or event associated with the + packet. By convention, procedure numbers start from 1 + and are assigned monotonically thereafter. + </dd> + <dt><code>type</code></dt> + <dd> + <p> + This can be one of the following enumeration values + </p> + <ol> + <li>call: invocation of a method call</li> + <li>reply: completion of a method call</li> + <li>event: an asynchronous event</li> + <li>stream: control info or data from a stream</li> + </ol> + </dd> + <dt><code>serial</code></dt> + <dd> + This is an number that starts from 1 and increases + each time a method call packet is sent. A reply or + stream packet will have a serial number matching the + original method call packet serial. Events always + have the serial number set to 0. + </dd> + <dt><code>status</code></dt> + <dd> + <p> + This can one of the following enumeration values + </p> + <ol> + <li>ok: a normal packet. this is always set for method calls or events. + For replies it indicates succesful completion of the method. For + streams it indicates confirmation of the end of file on the stream.</li> + <li>error: for replies this indicates that the method call failed + and error information is being returned. For streams this indicates + that not all data was sent and the stream has aborted</li> + <li>continue: for streams this indicates that further data packets + will be following</li> + </ol> + </dl> + + <h3><a href="protocolpayload">Packet payload</a></h3> + + <p> + The payload of a packet will vary depending on the <code>type</code> + and <code>status</code> fields from the header. + </p> + + <ul> + <li>type=call: the in parameters for the method call, XDR encoded</li> + <li>type=reply+status=ok: the return value and/or out parameters for the method call, XDR encoded</li> + <li>type=reply+status=error: the error information for the method, a virErrorPtr XDR encoded</li> + <li>type=event: the parameters for the event, XDR encoded</li> + <li>type=stream+status=ok: no payload</li> + <li>type=stream+status=error: the error information for the method, a virErrorPtr XDR encoded</li> + <li>type=stream+status=continue: the raw bytes of data for the stream. No XDR encoding</li> + </ul> + + <p> + For the exact payload information for each procedure, consult the XDR protocol + definition for the program+version in question + </p> + + <h3><a name="wireexamples">Wire examples</a></h3> + + <p> + The following diagrams illustrate some example packet exchanges + between a client and server + </p> + + <h4><a name="wireexamplescall">Method call</a></h4> + + <p> + A single method call and succesful + reply, for a program=8, version=1, procedure=3, which 10 bytes worth + of input args, and 4 bytes worth of return values. The overall input + packet length is is 4 + 24 + 10 == 38, and output packet length 32 + </p> + + <pre> + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call) + +--+-----------------------+-----------+ + + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply) + +--+-----------------------+--------+ + </pre> + + <h4><a name="wireexamplescallerr">Method call with error</a></h4> + + <p> + An unsuccessful method call will instead return an error object + </p> + + <pre> + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call) + +--+-----------------------+-----------+ + + +--+-----------------------+--------------------------+ + C <-- |48| 8 | 1 | 3 | 2 | 1 | 0 | .o.oOo.o.oOo.o.oOo.o.oOo | <-- S (error) + +--+-----------------------+--------------------------+ + </pre> + + <h4><a name="wireexamplescallup">Method call with upload stream</a></h4> + + <p> + A method call which also involves uploading some data over + a stream will result in + </p> + + <pre> + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call) + +--+-----------------------+-----------+ + + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply) + +--+-----------------------+--------+ + + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + ... + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+ + C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish) + +--+-----------------------+ + +--+-----------------------+ + C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish) + +--+-----------------------+ + </pre> + + <h4><a name="wireexamplescallbi">Method call bidirectional stream</a></h4> + + <p> + A method call which also involves a bi-directional stream will + result in + </p> + + <pre> + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call) + +--+-----------------------+-----------+ + + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply) + +--+-----------------------+--------+ + + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down) + +--+-----------------------+-------------....-------+ + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + .. + +--+-----------------------+-------------....-------+ + C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up) + +--+-----------------------+-------------....-------+ + +--+-----------------------+ + C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish) + +--+-----------------------+ + +--+-----------------------+ + C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish) + +--+-----------------------+ + </pre> + + + <h4><a name="wireexamplescallmany">Method calls overlapping</a></h4> + <pre> + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call 1) + +--+-----------------------+-----------+ + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 2 | 0 | .o.oOo.o. | --> S (call 2) + +--+-----------------------+-----------+ + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 2 | 0 | .o.oOo | <-- S (reply 2) + +--+-----------------------+--------+ + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 3 | 0 | .o.oOo.o. | --> S (call 3) + +--+-----------------------+-----------+ + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 3 | 0 | .o.oOo | <-- S (reply 3) + +--+-----------------------+--------+ + +--+-----------------------+-----------+ + C --> |38| 8 | 1 | 3 | 0 | 4 | 0 | .o.oOo.o. | --> S (call 4) + +--+-----------------------+-----------+ + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply 1) + +--+-----------------------+--------+ + +--+-----------------------+--------+ + C <-- |32| 8 | 1 | 3 | 1 | 4 | 0 | .o.oOo | <-- S (reply 4) + +--+-----------------------+--------+ + </pre> + + + <h2><a name="security">RPC security</a></h2> + + <p> + There are various things to consider to ensure an implementation + of the RPC protocol can be satisfactorily secured + </p> + + <h3><a name="securitytls">Authentication/encryption</a></h3> + + <p> + The basic RPC protocol does not define or require any specific + authentication/encryption capabilities. A generic solution to + providing encryption for the protocol is to run the protocol + over a TLS encrypted data stream. x509 certificate checks can + be done to form a crude authentication mechanism. It is also + possible for an RPC program to negotiate an encryption / + authentication capability, such as SASL, which may then also + provide per-packet data encryption. Finally the protocol data + stream can of course be tunnelled over transports such as SSH. + </p> + + <h3><a name="securitylimits">Data limits</a></h3> + + <p> + Although the protocol itself defines many arbitrary sized data values in the + payloads, to avoid denial of service attack there are a number of size limit + checks prior to encoding or decoding data. There is a limit on the maximum + size of a single RPC message, limit on the maximum string length, and limits + on any other parameter which uses a variable length array. These limits can + be raised, subject to agreement between client/server, without otherwise + breaking compatibility of the RPC data on the wire. + </p> + + <h3><a name="securityvalidate">Data validation</a></h3> + + <p> + It is important that all data be fully validated before performing + any actions based on the data. When reading an RPC packet, the + first four bytes must be read and the max packet size limit validated, + before any attempt is made to read the variable length packet data. + After a complete packet has been read, the header must be decoded + and all 6 fields fully validated, before attempting to dispatch + the payload. Once dispatched, the payload can be decoded and passed + onto the appropriate API for execution. The RPC code must not take + any action based on the payload, since it has no way to validate + the semantics of the payload data. It must delegate this to the + execution API (eg corresponding libvirt public API). + </p> + + <h2><a name="internals">RPC internal APIs</a></h2> + + <p> + The generic internal RPC library code lives in the <code>src/rpc/</code> + directory of the libvirt source tree. Unless otherwise noted, the + objects are all threadsafe. The core object types and their + purposes are: + </p> + + <h3><a name="apioverview">Overview of RPC objects</a></h3> + + <p> + The following is a high level overview of the role of each + of the main RPC objects + </p> + + <dl> + <dt><code>virNetSASLContextPtr</code> (virnetsaslcontext.h)</dt> + <dd>The virNetSASLContext APIs maintain SASL state for a network + service (server or client). This is primarily used on the server + to provide a whitelist of allowed SASL usernames for clients. + </dd> + + <dt><code>virNetSASLSessionPtr</code> (virnetsaslcontext.h)</dt> + <dd>The virNetSASLSession APIs maintain SASL state for a single + network connection (socket). This is used to perform the multi-step + SASL handshake and perform encryption/decryption of data once + authenticated, via integration with virNetSocket. + </dd> + + <dt><code>virNetTLSContextPtr</code> (virnettlscontext.h)</dt> + <dd>The virNetTLSContext APIs maintain TLS state for a network + service (server or client). This is primarily used on the server + to provide a whitelist of allowed x509 distinguished names, as + well as diffie-hellman keys. It can also do validation of + x509 certificates prior to initiating a connection, in order + to improve detection of configuration errors. + </dd> + + <dt><code>virNetTLSSessionPtr</code> (virnettlscontext.h)</dt> + <dd>The virNetTLSSession APIs maintain TLS state for a single + network connection (socket). This is used to perform the multi-step + TLS handshake and perform encryption/decryption of data once + authenticated, via integration with virNetSocket. + </dd> + + <dt><code>virNetSocketPtr</code> (virnetsocket.h)</dt> + <dd>The virNetSocket APIs provide a higher level wrapper around + the raw BSD sockets and getaddrinfo APIs. They allow for creation + of both server and client sockets. Data transports supported are + TCP, UNIX, SSH tunnel or external command tunnel. Internally the + TCP socket impl uses the getaddrinfo info APIs to ensure correct + protocol independant behaviour, thus supporting both IPv4 and IPv6. + The socket APIs can be associated with a virNetSASLSessionPtr or + virNetTLSSessionPtr object to allow seemless encryption/decryption + of all writes and reads. For UNIX sockets it is possible to obtain + the remote client user ID and process ID. Integration with the + libvirt event loop also allows use of callbacks for notification + of various I/O conditions + </dd> + + <dt><code>virNetMessagePtr</code> (virnetmessage.h)</dt> + <dd>The virNetMessage APIs provide a wrapper around the libxdr + API calls, to facilitate processing and creation of RPC + packets. There are convenience APIs for encoding/encoding the + packet headers, encoding/decoding the payload using an XDR + filter, encoding/decoding a raw payload (for streams), and + encoding a virErrorPtr object. There is also a means to + add to/serve from a linked-list queue of messages.</dd> + + <dt><code>virNetClientPtr</code> (virnetclient.h)</dt> + <dd>The virNetClient APIs provide a way to connect to a + remote server and run one or more RPC protocols over + the connection. Connections can be made over TCP, UNIX + sockets, SSH tunnels, or external command tunnels. There + is support for both TLS and SASL session encryption. + The client also supports management of multiple data streams + over each connection. Each client object can be used from + multiple threads concurrently, with method calls/replies + being interleaved on the wire as required. + </dd> + + <dt><code>virNetClientProgramPtr</code> (virnetclientprogram.h)</dt> + <dd>The virNetClientProgram APIs are used to register a + program+version with the connection. This then enables + invocation of method calls, receipt of asynchronous + events and use of data streams, within that program+version. + When created a set of callbacks must be supplied to take + care of dispatching any incoming asynchronous events. + </dd> + + <dt><code>virNetClientStreamPtr</code> (virnetclientstream.h)</dt> + <dd>The virNetClientStream APIs are used to control transmission and + receipt of data over a stream active on a client. Streams provide + a low latency, unlimited length, bi-directional raw data exchange + mechanism layered over the RPC connection + </dd> + + <dt><code>virNetServerPtr</code> (virnetserver.h)</dt> + <dd>The virNetServer APIs are used to manage a network server. A + server exposed one or more programs, over one or more services. + It manages multiple client connections invoking multiple RPC + calls in parallel, with dispatch across multiple worker threads. + </dd> + + <dt><code>virNetServerMDNSPtr</code> (virnetservermdns.h)</dt> + <dd>The virNetServerMDNS APIs are used to advertize a server + across the local network, enabling clients to automatically + detect the existance of remote services. This is done by + interfacing with the Avahi mDNS advertisement service. + </dd> + + <dt><code>virNetServerClientPtr</code> (virnetserverclient.h)</dt> + <dd>The virNetServerClient APIs are used to manage I/O related + to a single client network connection. It handles initial + validation and routing of incoming RPC packets, and transmission + of outgoing packets. + </dd> + + <dt><code>virNetServerProgramPtr</code> (virnetserverprogram.h)</dt> + <dd>The virNetServerProgram APIs are used to provide the implementation + of a single program/version set. Primarily this includes a set of + callbacks used to actually invoke the APIs corresponding to + program procedure numbers. It is responsible for all the serialization + of payloads to/from XDR.</dd> + + <dt><code>virNetServerServicePtr</code> (virnetserverservice.h)</dt> + <dd>The virNetServerService APIs are used to connect the server to + one or more network protocols. A single service may involve multiple + sockets (ie both IPv4 and IPv6). A service also has an associated + authentication policy for incoming clients. + </dd> + </dl> + + <h3><a name="apiclientdispatch">Client RPC dispatch</a></h3> + + <p> + The client RPC code must allow for multiple overlapping RPC method + calls to be invoked, transmission & receipt of data for mutliple + streams and receipt of asynchronous events. Understandably this + involves coordination of multiple threads. + </p> + + <p> + The core requirement in the client dispatch code is that only + one thread is allowed to be performing I/O on the socket at + any time. This thread is said to be "holding the buck". When + any other thread comes along and needs todo I/O it must place + its packets on a queue and delegate processing of them to the + thread that has the buck. This thread will send out the method + call, and if it sees a reply will pass it back to the waiting + thread. If the other thread's reply hasn't arrived, by the time + the main thread has got its own reply, then it will transfer + responsibility for I/O to the thread that has been waiting the + longest. It is said to be "passing the buck" for I/O. + </p> + + <p> + When no thread is performing any RPC method call, or sending + stream data there is still a need to monitor the socket for + incoming I/O related to asynchronous events, or stream data + receipt. For this task, a watch is registered with the event + loop which triggers whenever the socket is readable. This + watch is automatically disabled whenever any other thread + grabs the buck, and re-enabled when the buck is released. + </p> + + <h4><a name="apiclientdispatchex1">Example with buck passing</a></h4> + + <p> + In the first example, a second thread issues a API call + while the first thread holds the buck. The reply to the + first call arrives first, so the buck is passed to the + second thread. + </p> + + <pre> + Thread-1 + | + V + Call API1() + | + V + Grab Buck + | Thread-2 + V | + Send method1 V + | Call API2() + V | + Wait I/O V + |<--------Queue method2 + V | + Send method2 V + | Wait for buck + V | + Wait I/O | + | | + V | + Recv reply1 | + | | + V | + Pass the buck----->| + | V + V Wait I/O + Return API1() | + V + Recv reply2 + | + V + Release the buck + | + V + Return API2() + </pre> + + <h4><a name="apiclientdispatchex2">Example without buck passing</a></h4> + + <p> + In this second example, a second thread issues an API call + which is sent and replied to, before the first thread's + API call has completed. The first thread thus notifies + the second that its reply is ready, and there is no need + to pass the buck + </p> + + <pre> + Thread-1 + | + V + Call API1() + | + V + Grab Buck + | Thread-2 + V | + Send method1 V + | Call API2() + V | + Wait I/O V + |<--------Queue method2 + V | + Send method2 V + | Wait for buck + V | + Wait I/O | + | | + V | + Recv reply2 | + | | + V | + Notify reply2------>| + | V + V Return API2() + Wait I/O + | + V + Recv reply1 + | + V + Release the buck + | + V + Return API1() + </pre> + + <h4><a name="apiclientdispatchex3">Example with async events</a></h4> + + <p> + In this example, only one thread is present and it has to + deal with some async events arriving. The events are actually + dispatched to the application from the event loop thread + </p> + + <pre> + Thread-1 + | + V + Call API1() + | + V + Grab Buck + | + V + Send method1 + | + V + Wait I/O + | Event thread + V ... + Recv event1 | + | V + V Wait for timer/fd + Queue event1 | + | V + V Timer fires + Wait I/O | + | V + V Emit event1 + Recv reply1 | + | V + V Wait for timer/fd + Return API1() | + ... + </pre> + + <h3><a name="apiserverdispatch">Server RPC dispatch</a></h3> + + <p> + The RPC server code must support receipt of incoming RPC requests from + multiple client connections, and parallel processing of all RPC + requests, even many from a single client. This goal is achieved through + a combination of event driven I/O, and multiple processing threads. + </p> + + <p> + The main libvirt event loop thread is responsible for performing all + socket I/O. It will read incoming packets from clients and willl + transmit outgoing packets to clients. It will handle the I/O to/from + streams associated with client API calls. When doing client I/O it + will also take pass the data through any applicable encryption layer + (through use of the virNetSocket / virNetTLSSession and virNetSASLSession + integration). What is paramount is that the event loop thread never + do any task that can take a non-trivial amount of time. + </p> + + <p> + When reading packets, the event loop will first read the 4 byte length + word. This is validated to make sure it does not exceed the maximum + permissible packet size, and the client is set to allow receipt of the + rest of the packet data. Once a complete packet has been received, the + next step is to decode the RPC header. The header is validated to + ensure the request is sensible, ie the server should not receive a + method reply from a client. If the client has not yet authenticated, + a security check is also applied to make sure the procedure is on the + whitelist of those allowed prior to auth. If the packet is a method + call, it will be placed on a global processing queue. The event loop + thread is now done with the packet for the time being. + </p> + + <p> + The server has a pool of worker threads, which wait for method call + packets to be queued. One of them will grab the new method call off + the queue for processing. The first step is to decode the payload of + the packet to extract the method call arguments. The worker does not + attempt todo any semantic validation of the arguments, except to make + sure the size of any variable length fields is below defined limits. + </p> + + <p> + The worker now invokes the libvirt API call that corresponds to the + procedure number in the packet header. The worker is thus kept busy + until the API call completes. The implemementation of the API call + is responsible for doing semantic validation of parameters and any + MAC security checks on the objects affected. + </p> + + <p> + Once the API call has completed, the worker thread will take the + return value and output parameters, or error object and encode + them into a reply packet. Again it does not attempt todo any + semantic validation of output data, aside from variable length + field limit checks. The worker thread puts the reply packet onto + the transmission queue for the client. The worker is now finished + and goes back to wait for another incoming method call. + </p> + + <p> + The main event loop is back in charge and when the client socket + becomes writable, it will start sending the method reply packet + back to the client. + </p> + + <p> + At any time the libvirt connection object can emit asynchronous + events. These are handled by callbacks in the main event thread. + The callback will simply encode the event parameters into a new + data packet and place the packet on the client transmission + queue. + </p> + + <p> + Incoming and outgoing stream packets are also directly handled + by the main event thread. When an incoming stream packet is + received, instead of placing it in the global dispatch queue + for the worker threads, it is sidetracked into a per-stream + processing queue. When the stream becomes writable, queued + incoming stream packets will be processed, passing their data + payload onto the stream. Conversely when the stream becomes + readable, chunks of data will be read from it, encoded into + new outgoing packets, and placed on the client's transmit + queue + </p> + + <h4><a name="apiserverdispatchex1">Example with overlapping methods</a></h4> + + <p> + This example illustrates processing of two incoming methods with + overlapping execution + </p> + + <pre> + Event thread Worker 1 Worker 2 + | | | + V V V + Wait I/O Wait Job Wait Job + | | | + V | | + Recv method1 | | + | | | + V | | + Queue method1 V | + | Serve method1 | + V | | + Wait I/O V | + | Call API1() | + V | | + Recv method2 | | + | | | + V | | + Queue method2 | V + | | Serve method2 + V V | + Wait I/O Return API1() V + | | Call API2() + | V | + V Queue reply1 | + Send reply1 | | + | V V + V Wait Job Return API2() + Wait I/O | | + | ... V + V Queue reply2 + Send reply2 | + | V + V Wait Job + Wait I/O | + | ... + ... + </pre> + + <h4><a name="apiserverdispatchex2">Example with stream data</a></h4> + + <p> + This example illustrates processing of stream data + </p> + + <pre> + Event thread + | + V + Wait I/O + | + V + Recv stream1 + | + V + Queue stream1 + | + V + Wait I/O + | + V + Recv stream2 + | + V + Queue stream2 + | + V + Wait I/O + | + V + Write stream1 + | + V + Write stream2 + | + V + Wait I/O + | + ... + </pre> + + </body> +</html> diff --git a/docs/remote.html.in b/docs/remote.html.in index b554950..6a8e830 100644 --- a/docs/remote.html.in +++ b/docs/remote.html.in @@ -53,9 +53,6 @@ machines through authenticated and encrypted connections. <li> <a href="#Remote_limitations">Limitations</a> </li> - <li> - <a href="#Remote_implementation_notes">Implementation notes</a> - </li> </ul> <h3> <a name="Remote_basic_usage">Basic usage</a> @@ -880,47 +877,5 @@ just read-write/read-only as at present. <p> Please come and discuss these issues and more on <a href="https://www.redhat.com/mailman/listinfo/libvir-list" title="libvir-list mailing list">the mailing list</a>. </p> - <h3> - <a name="Remote_implementation_notes">Implementation notes</a> - </h3> - <p> -The current implementation uses <a href="http://en.wikipedia.org/wiki/External_Data_Representation" title="External Data Representation">XDR</a>-encoded packets with a -simple remote procedure call implementation which also supports -asynchronous messaging and asynchronous and out-of-order replies, -although these latter features are not used at the moment. -</p> - <p> -The implementation should be considered <b>strictly internal</b> to -libvirt and <b>subject to change at any time without notice</b>. If -you wish to talk to libvirtd, link to libvirt. If there is a problem -that means you think you need to use the protocol directly, please -first discuss this on <a href="https://www.redhat.com/mailman/listinfo/libvir-list" title="libvir-list mailing list">the mailing list</a>. -</p> - <p> -The messaging protocol is described in -<code>qemud/remote_protocol.x</code>. -</p> - <p> -Authentication and encryption (for TLS) is done using <a href="http://www.gnu.org/software/gnutls/" title="GnuTLS project page">GnuTLS</a> and the RPC protocol is unaware of this layer. -</p> - <p> -Protocol messages are sent using a simple 32 bit length word (encoded -XDR int) followed by the message header (XDR -<code>remote_message_header</code>) followed by the message body. The -length count includes the length word itself, and is measured in -bytes. Maximum message size is <code>REMOTE_MESSAGE_MAX</code> and to -avoid denial of services attacks on the XDR decoders strings are -individually limited to <code>REMOTE_STRING_MAX</code> bytes. In the -TLS case, messages may be split over TLS records, but a TLS record -cannot contain parts of more than one message. In the common RPC case -a single <code>REMOTE_CALL</code> message is sent from client to -server, and the server then replies synchronously with a single -<code>REMOTE_REPLY</code> message, but other forms of messaging are -also possible. -</p> - <p> -The protocol contains support for multiple program types and protocol -versioning, modelled after SunRPC. -</p> </body> </html> diff --git a/docs/sitemap.html.in b/docs/sitemap.html.in index 897ee94..f50a8d2 100644 --- a/docs/sitemap.html.in +++ b/docs/sitemap.html.in @@ -289,6 +289,10 @@ <span>Spawning commands from libvirt driver code</span> </li> <li> + <a href="internals/rpc.html">RPC protocol & APIs</a> + <span>RPC protocol information and API / dispatch guide</span> + </li> + <li> <a href="internals/locking.html">Lock managers</a> <span>Use lock managers to protect disk content</span> </li> -- 1.7.6 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list