On Tue, May 26, 2009 at 06:57:18PM +0100, Richard W.M. Jones wrote: > FWIW this is the libguestfs RPC protocol: > > http://et.redhat.com/~rjones/libguestfs/guestfs.3.html#communication_protocol > http://git.et.redhat.com/?p=libguestfs.git;a=blob;f=src/guestfs_protocol.x;hb=HEAD > > It's not directly relevant because at present the server is single- > threaded and answers calls in order. It is actually pretty relevant from the wire protocol POV, and matches the ideas I'd been having. With your chunked encoding, you've only got 4 bytes overhead per chunk sent. I was thinking of introducing a new message type to the existing three enum remote_message_direction { REMOTE_CALL = 0, /* client -> server */ REMOTE_REPLY = 1, /* server -> client */ REMOTE_MESSAGE = 2 /* server -> client, asynchronous [NYI] */ }; aka, REMOTE_DATA_CHUNK = 3 This indicates a message which has 'struct remote_message_header' then followed by the data. The idea of this new type, instead of REMOTE_MESSAGE, is that we treat the payload of REMOTE_DATA_CHUNK as totally opaque and thus avoid the extra data copies inherant in defining the payload to be an XDR byte array. So my idea would have 24 bytes overhead per chunk instead of your four. It would also allow us to maintain concurrency, with other threads can be making RPC calls over the same socket, and them being interleaved with individual data chunk mesages. > These are the relevant points of the file transfer system: > > - At the API level, you pass in filenames. The caller is responsible > for creating a named pipe in the filesystem, or passing in names like > "/dev/fd/N". That has the problem though, that you can't neccesarily assume that the file handle you have has the data in the same encoding you want to process it in. In the case of libvirtd invoking a libvirt API to handle an RPC request, the data is coming in off the client socket and thus needs passing through SASL/TLS decryption. To do this with a API taking a filename, you'd need to create a named pipe, and read off the socket, write into the pipe and then pass the pipe name to the API which adds several more data copies. With the RAM size of VMs this will have a significant impact on CPU & memory bandwidth utilization during migration. If we can pass the data directly from SASL/TLS decryption to the driver, then we can limit ourselves to 2 data copies in the libvirt space. Normal RPC calls have 3 copies in libvirt, the 3rd coming from the XDR format deserialization, but we avoid the third with the custom message type for data streams. > - File transfers are sent using chunked encoding. The key was to > allow cancellation *initiated from either side* (not as easy as it > seems). So if an error occurs at either end, the transfer can be > stopped almost immediately, and synchronization can be reestablished. > The details are in the link above. Yes, those are the points that are particularly fun / interesting. It looks like the scenarios you've identified there all match up to those I've been worrying about. So that's good reassurance that I'm thinking along the rights lines. I reckon the extra 20 bytes overhead per chunk of using an explicit message type, instead of just sending a serious of len+payload chunks is a worthwhile tradeoff in libvirt's case to allow better message interleaving on the socket. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list