----- Original Message ----- > Marc-André, > > Thanks for the comments! I'll certainly follow your advice. About wiki > - how to create account there? Tried > http://www.spice-space.org/wiki/index.php?title=Special:UserLogin&returnto=Main_Page Same for me, spice-space.org was recently moved to a different server, that might explain it. Anyway, you can use other tools (google docs), then it can later be referenced or copied in spice-space.org wiki. > but it returns empty page for me... > > Few comments below. > > > On Tue, Oct 15, 2013 at 2:09 PM, Marc-André Lureau <mlureau@xxxxxxxxxx> > wrote: > > > > > > ----- Original Message ----- > >> Hello Spice developers, > >> > >> I want to introduce my idea of Virtual Media Controller (VMC), > >> enhancing support for IP telephony in Spice-based VDI. Hope for your > >> feedback! > >> This is a concept only, very high level and without any Proof of > >> Concept implemented yet. The concept is divided into 3 levels: VMC > >> API, VMC Advanced and VMC Ultimate. The level of unknown increases > >> with them. VMC API seems to be fairly straightforward and doable, > >> others are more risky and already have known open issues questioning a > >> possibility to implement them (and probably much more I'm not aware of > >> yet). > >> > >> *Why VMC is needed?* > >> Main problem of IP telephony software running in VDI is media stream > >> hairpinning at VDI server. Let we have Alice and Bob working in office > >> O2, they're connecting to their virtual desktops at VDI server in a > >> main office O1. They're using softphones running at the VMs to make a > >> p2p audio call (video part behaves similarly, just adds unnecessary > >> complication to this example). Let's look at the route of Alice's > >> outgoing audio stream. Audio is captured from microphone at her > >> PC/thin client, then encoded by Spice client and sent over Spice > >> channel to VM, where it is decoded to PCM and presented as a source > >> for virtual microphone. Softphone then encodes it once again and sends > >> it to a peer, softphone of Bob running at another VM of the same VDI > >> server. The stream is decoded once again, played into virtual speaker, > >> which in turn encodes it and sends over Spice channel to Spice client > >> at Bob's PC, where it is decoded and finally played out in a real > >> headset. > >> > >> We can see 2 major issues with this scheme: > >> 1. Media stream is traveling via VDI server, not p2p. So even if 2 > >> people in office O2 are making a call, the traffic goes through VDI > >> server at office O1. This introduces extra delay into the > >> conversation, potentially increases jitter and packet loss (depends on > >> network), and this results in extra network load. > >> 2. Media stream is transcoded (decoded and then encoded) at VDI server > >> twice (if count Bob's stream, 4 times!). This means extra CPU usage of > >> VDI server, effectively reducing VM density. This also means > >> degradation of quality if lossy codecs are used. > >> > >> *VMC Solution* > >> The most adequate solution to both issues is to make the conversation > >> p2p, remove VDI server from the route entirely. So the question is > >> only how to actually make it. > >> First part of the VMC idea is to introduce a media engine at client > >> side, and API for softphone developers to manipulate this engine. We > >> may think about following components: > >> 1. VMC Agent - a component providing media-handling API for > >> applications running at this virtual machine. [Probably would need to > >> work through Spice agent, or via similar means - adding new virtual > >> device to qemu] > >> 2. VMC Engine - media engine running at user's client/PC. It provides > >> actual media handling and is controlled with commands from VMC Agent. > >> 3. VMC Transport - a "component" implementing connection between the > >> agent and the engine. Actual design is TBD. This is some sort of RPC > >> over Spice connection. > >> 4. VMC OverlayRenderer - this advanced component is needed for video > >> support only. It integrates local video rendering inside virtual > >> session window. > >> > >> Softphone developers would need to use VMC Agent API as a media engine > >> for their application - so changes in the softphone are required. > > > > Although I don't know in details Telepathy, it looks like what you > > describe. Except that audio/video stream is proxyed, and decoded directly > > in client. Is that correct? > > > > What do you mean saying 'media stream is proxyed'? We need P2P > connections, avoiding any proxy servers (within an IP network, that > is). So the stream is delivered directly and decoded in the client, > totally by-passing VDI server. This is not always possible, so I suggest to start with proxying before doing p2p. > > 1. agent: telepathy session & API > > I have a generic agent in mind, not tied to particular softphone... > What about D-Bus-based common API and GStreamer, VLC and Google WebRTC > VoE&ViE wrappers as shared libraries? At start of softphone/media app, > it just links to our .so instead of its normal media engine - and gets > everything working, not even knowing that media is processed at the > client... I think telepathy was supposed to be very generic (in fact, it was supposed to be just an interface spec iirc), but given the complexity of voip stack, it's just a dream. But feel free to propose something else, I was basically making an analogy. > > > 2. engine: gstreamer > > 3. tbd (rtp?), dedicated spice channel > > Not RTP certainly, as this channel isn't for media transfer but for > RPC - agent calling functions of remote media engine. > > > 4. internal of spice client > > > Agree. > > >> *VMC Advanced* > >> More general problem can be set: make arbitrary softphones running at > >> VMs work without VDI hairpinning. Arbitrary means without code changes > >> in these softphones. Solution of this problem adds much more value to > >> Spice VDI, as any third-party applications, including commercial ones > >> like Skype, would be supported. Skype may be bad example... But modern > >> enterprise SIP or H.323-based softphones may be a good one (MS Lync to > >> name one). > >> > >> But first of all, let's look into an interesting yet mostly > >> non-commercial case - Linux VM. For this case there is a chance of > >> implementing Agent API to follow APIs of widespread media engines - > >> GStreamer, VLC (what else?). This way we'd be able to support > >> arbitrary media apps based on these engines. > > > > Yes, we discussed about this for video-passthrough. Having a GStreamer > > passthrough would be quite awesome, although limited to very few use cases > > unfortunately, since most of the time the decoded video is post-processed, > > and there are relatively few GStreamer apps among all the media apps. > > Also, you have issues on client side, like codec support (which can be > > discarded by saying that spice doesn't ship the problematic codecs itself, > > but then the story is not fun for windows and mac users). > > > >> [Notes: > >> 1. If we add Google WebRTC media engine bindings, softphone developers > >> who use this API should be able to add support for our system fairly > >> straightforwardly. > >> 2. GStreamer, VLC, WebRTC are cross-platform, so implementing their > >> API may help with enabling support of some softphones at Windows VMs > >> as well] > >> > >> One major part that needs additional work is signaling. The issue is > >> following: when a communication channel is established between 2 > >> parties, they exchange their IP addresses in the signaling messages. > >> Softphone at VM will advertise its virtual IP address in such > >> situation - but we need to make the client to be the receiving end, so > >> we need client's IP address to appear in softphone's message. And we > >> want our solution to be as signaling protocol agnostic as possible, > >> i.e. parsing and changing IP address in signaling messages isn't an > >> option (and signaling traffic is usually guarded by TLS connections > >> anyway). Dealing with this is big open question (up for a networking > >> guru!) . I'd love any comments / possible solutions for this! > >> > >> How I see this problem: > >> 1. In VDI server with real address IP1 there is a VM with some address > >> IP2 (NAT or not - not specified) > >> 2. At the VM, an application is running (softphone) > >> 3. User connects via Spice client, from a client/PC with address IP3 > >> 4. Need to trick the softphone into thinking it is running at the > >> machine with IP3 > >> 5. The softphone signaling should continue to work normally otherwise > >> 6. All other applications at the VM should continue to work normally > >> using address IP2 > >> Variables to play with: NAT, virtual network driver, configuration of > >> softphone (we can expect it uses particular ports for media) > >> > >> The only seemingly implementable idea of mine works only for softphone > >> which supports ICE (NAT traversal), and only for the case when there > >> is STUN-traversable NAT (i.e. not symmetric one). The workflow: > >> 1. Once Spice client connects, it also establishes connection with > >> fake STUN server and instructs it about translation IP2->IP3. > >> 2. Once softphone attempts to make or receive a call, it asks STUN > >> server for a candidate IP address - and receives IP3. > >> Fake STUN server implementation TBD. > >> > >> Weak spots: > >> a) what about other applications relying on this 'STUN' server? They > >> won't work probably. > >> b) What if there is no NAT at all? Softphone will detect P2P > >> connectivity and won't use ICE probably... > >> c) What about softphones which do not support ICE? > >> I also thought about solution involving custom virtual network > >> drivers, but it seems to be impossible to split behavior for softphone > >> and for the rest of the system at this low level... > > > > It looks like you have thought about this a lot. You should start > > documenting this on a spice-space wiki > > http://www.spice-space.org/page/PlannedFeatures. > > > > Also, since this feature is quite specific to voip-domain, I think it is > > best to ask voip people about the tricks you can do, on Telepathy or Ekiga > > mailing list for example. > > > >> *VMC Ultimate* > >> This is the last step forward, to cover softphone which isn't based on > >> common media engine (probably things like Skype can be covered). > >> The main idea is to detect what media action a softphone is doing - > >> and actually do it at remote VMC engine. The approach assumes > >> signaling issue is solved somehow. Amended virtual audio, video and > >> network devices/drivers are needed. For example, let's look at > >> incoming audio call scenario, user picks up the call. The VMC Engine > >> at client detects presence of incoming audio stream, parses out the > >> codec (pcap?) and makes virtual network send sample stream to the > >> softphone. Once softphone starts playing out, virtual audio device > >> detects the decoded sample in the output. This is a sign for VMC > >> Engine to start playing out the incoming audio stream. And once > >> softphone starts reading a sample from virtual microphone and sending > >> encoded audio stream in the output, it can be detected at the virtual > >> network level and VMC Engine would start the actual outgoing audio > >> stream. > >> > >> Same can be done for video - need to detect presence of decoded video > >> stream sample, and notify VMC Engine to start rendering - and the > >> stream should be rendered over the appropriate window using VMC > >> OverlayRenderer. Samples need to be really simple (e.g. black picture > >> with a label as a video stream) - so they should be easily generated > >> (or even pre-loaded) and detected, and processing of them should take > >> as less server CPU as possible (true not for all codecs - only for > >> those for which complexity of encoding/decoding depends on incoming > >> data parameters; for adaptive codecs like H.264 SVC, smaller supported > >> samples can be used than actual video stream decoded in VMC engine...) > >> > >> Does this make any sense? I was inspired by current Spice detection of > >> video stream... Looks too complicated and risky probably. > >> > >> Obvious hard case for all these VMC schemes - encrypted media streams > >> (usually SRTP). > > > > Indeed, that's what I was going to ask ;) > > AFAIK, most open-source softphones do not have SRTP yet... More of an > enterprise feature, and fairly limitied, as it guards only "first > hop", not the whole conversation. > Need to think about API-based solution for SRTP, e.g. encrypt/verify > plug-in at client, with agent API to provide the keys. > > -- > Best regards, > Fedor > _______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/spice-devel