> On 19 Jun 2018, at 14:11, Frediano Ziglio <fziglio@xxxxxxxxxx> wrote: > >> >> On Fri, 2018-06-15 at 15:12 +0200, Marc-André Lureau wrote: >>> Hi >>> >>> On Fri, Jun 15, 2018 at 1:01 PM, Lukáš Hrázký <lhrazky@xxxxxxxxxx> wrote: >>>> On Fri, 2018-06-15 at 12:24 +0200, Marc-André Lureau wrote: >>>>> Hi >>>>> >>>>> On Fri, Jun 15, 2018 at 12:16 PM, Lukáš Hrázký <lhrazky@xxxxxxxxxx> >>>>> wrote: >>>>>> On Thu, 2018-06-14 at 21:12 +0200, Marc-André Lureau wrote: >>>>>>> Hi >>>>>>> >>>>>>> On Tue, Jun 5, 2018 at 5:30 PM, Lukáš Hrázký <lhrazky@xxxxxxxxxx> >>>>>>> wrote: >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> following is my attempt at solving the ID issues with >>>>>>>> monitors_config >>>>>>>> and streaming. The concept is as follows: >>>>>>>> >>>>>>> >>>>>>> Before introducing a new solution, could you describe the "issues >>>>>>> with >>>>>>> monitors_config and streaming"? I am not following closely enough >>>>>>> the >>>>>>> mailing list probably, so a recap would be welcome. I'd like to >>>>>>> understand the shortcomings of the current messages, and see if we >>>>>>> are >>>>>>> on the same page about it. Thanks! >>>>>> >>>>>> Right, sorry about that! >>>>>> >>>>>> The issue is when having a plugin a for the streaming agent that is >>>>>> using a HW-accelerated encoder. The typical case is that you have a >>>>>> QXL >>>>>> monitor in the guest which shows the boot and then you have a >>>>>> physical >>>>>> HW device (either directly assigned to the VM or a vGPU) which is >>>>>> configured in the X server. >>>>>> >>>>>> Once the X server starts, the QXL monitor goes blank and you get a >>>>>> second monitor on the client with the streamed content from the >>>>>> streaming agent plugin. >>>>>> >>>>>> The problem is the code and the protocol assumes (amongst other >>>>>> assumptions) that all monitors are configured in X. The monitors on >>>>>> the >>>>>> client are put into an array and the index is used as the ID >>>>>> throughout >>>>>> SPICE. So for the case above, the QXL monitor is ID 0 and the >>>>>> streamed >>>>>> monitor is ID 1. >>>>> >>>>> >>>>> Since the "streamed monitor" replaces the QXL monitor, wouldn't it >>>>> make sense for them to share the same ID? This would be handled by the >>>>> server/guest transparently. >>>> >>>> Well... for one, it does not entirely replace it. From a user >>>> experience PoV, yes, it makes sense to "replace" one monitor for the >>>> other. We've even considered switching the content of the display >>>> channel from QXL to the streamed one. But in the system, they are >>>> different monitors on different channels, so it's a bad idea to give >>>> them the same ID. And the ID we're talking about is actually (for the >>>> monitors config messages): >>> >>> So if it's the same from use PoV, the client API shouldn't need to change. >> >> I didn't say it's the same from the user PoV, not entirely. We can >> assume he want's to either be looking at the QXL monitor or the >> streamed one, i.e.: >> >> 1. VM is booting, we are showing QXL monitor >> 2. X started, we switch to showing the streamed monitor > > Note: this is also due to X limitation not supporting multiple devices, > on Windows probably you'll have both used at the same time and probably > user will manually disable QXL (as wants to use the other) > >> 3. The user switches to a different TTY, we switch back to QXL >> etc... >> >> This is for the simple case of one QXL monitor and one streamed >> monitor. But we are still "hiding" from the user that his VM actually >> has two graphics devices, the QXL one and a physical (or virtual) GPU. >> >> I once advocated the approach to switch the monitors for the users, but >> now I think it would actually complicate things more and could >> potentially cause problems. >> > > I think the possibility to have the switch would be interesting and help > but I think the actual protocol is limiting and potentially complicating > stuff. > >>> The client only cares about about having a specific set of monitors >>> configuration. The way the server/guest handled them is not its >>> business, it expects the best configuration. > > True, the client should implement the protocol and server should use > the protocol as best as it can. This does not however mean that the > current protocol is perfect and using it as it is don't limit us > and make the code a bunch of spaghetti code in order to make protocol > happy. > >> >> That is true, but the monitors configuration needs to reflect the >> actual configuration of the guest. If we mix it up for the clients >> convenience, problems... :) >> >>> So far, we have the current simplified model (ie other more complex >>> combinations are not supported): >>> >>> qxl device, display channel 0, >>> monitor 0: >>> monitor 1: >>> monitor x.. >>> >>> >>> or >>> >>> qxl device, display channel 0, >>> monitor 0: >>> qxl device, display channel 1, >>> monitor 0: >>> .. >>> >>> Feel free to correct me, I haven't been looking into this for many years >>> now. >> >> Correct. >> > > True. Note that this model however have physical (sockets and connections) > repercussions which could be hard to maintain. > If we decide to keep the current protocol and have one device we end up > with a huge surface 0 with all monitors in it having to mux and unmux streamings > and commands in both server and clients having to implement fair queueing and > synchronization just to maintain an old design. > Said that this approach would be prohibitive and looking at X implementation > requiring a single surface this means that if streaming devices are used we must > limit to one QXL with 1 monitor only and "map" all monitors to different channels > (with one monitor each). > >>>> >>>> server -> client: a pair (channel_id, monitor_id) >>>> >>>> client -> server: channel_id + monitor_id, converted to an array index >>>> under the assumption I mentioned >>>> >>>> So we can't really make it the same even if we wanted. >>> >>> With multi-head/xrandr qxl: >>> >>> qxl device, display channel 0, >>> monitor 0: >>> monitor 1: >>> monitor x.. >>> gpu device, stream channel 0, >>> maps to display channel 0, monitor 0 >>> gpu device, stream channel 1, >>> maps to display channel 0, monitor 1 >>> ... >>> >>> With multihead/xrandr qxl & gpu: >>> qxl device, display channel 0, >>> monitor 0: >>> monitor 1: >>> monitor x.. >>> gpu device, stream channel 0, >>> maps to display channel 0, monitor 0 >>> maps to display channel 0, monitor 1 >>> ... >>> >>> With multiple QXL devices: >>> >>> qxl device, display channel 0, >>> monitor 0: >>> qxl device, display channel 1, >>> monitor 0: >>> gpu device, stream channel 0, >>> maps to display channel 0, monitor 0 >>> gpu device, stream channel 1, >>> maps to display channel 1, monitor 0 >> >> But, regardless of multihead/multiple devices, there is no guarantee >> that the number of QXL monitors is going to match the number of GPU >> device monitors. In fact, the most probable scenario is one QXL monitor >> and multiple GPU device monitors. >> > > This is not an issue, would be just a problem of allocating enough > channels to support all monitors we need. > >> I don't think it makes sense to have more than one QXL monitor anyway. >> And if you did, it makes no sense to map them to GPU device monitors >> besides the first one, I guess... >> >> Also, it is unclear what you mean by the "mapping", i.e. how exactly >> would you implement it? >> > > Simply to the client you present a channel_id, monitor_id which can > be different on the server. I don’t know if this was Lukas’ question, but how would we define this mapping. That was my original question to Marc-André. > >>> If you want to support QXL devices that should not be associated with >>> a gpu/stream channel, then you should be able to flag them. > > Not clear what do you want with this. A QXL device should be either be > seen by the client or not. You need to flag if seen or not. > >>> >>> If you need a manual association of stream channel with qxl/monitor, >>> this could be done with a manual configuration. > > I don't see a case where I would need a manual configuration, I mean, > should be configurable within the guest disabling/enabling devices > and/or monitors. What about the case where you have physical monitors also connected to the VM? > >>> >>> Imho, we should avoid making things more complex. Testing is hard >>> enough. We should actually take the simplest approach possible (the >>> same decision we took before, it's already a mess to deal with, as you >>> noticed) > > You are stating that the current situation is complex and also we need > to keep things simple. Looks like a bit contradictory. > Maybe the current situation is complex, or feel more complex than it > should be, for the choices done and current implementation. > From my point of view is just that client need to tell "I want this > monitor this size" and "I'm moving the mouse here on this monitor". > The problem is defining "this monitor" at protocol level from client > to server, including messages to agent. > >> >> I fully agree with the notion :) However, we are already making things >> much more complex by adding the GPU device streaming feature. It breaks > > I don't think the problem is related to streaming, more about devices > which are not QXL. Technically correct, but at the moment, this h/w streaming is the only case, right? > With only QXL all old assumptions are easy to maintain, > the problem is the new devices. For instance having QXL 0 as Xrandr ID 0 > and QXL 1 as Xrandr ID 1 is pretty easy, is the same driver and actually > we also manage (at code level!) the guest device driver. Just imagine if > the driver changes the order or physical device scanning having QXL 0 as > Xrandr 1 and QXL 1 as Xrandr 0! This would break the channel_id+monitor_id > formula. This does not happen as we maintain it, but we (SPICE) don't > maintain Intel/Nvidia/ATI drivers or Xorg/Wayland/Windows! > > I'm relative new in this group (3 years) compared to these stuff, > some are possibly not in current git repos, but I think that more or > less the history is: > - before multimonitor. No much issues the monitor was the monitor; > - start adding multi monitor support, only one device was used. The > problem was only sending the ID of the monitor which was 0, 1, ... > Added a monitor_config message in the SPICE protocol from server > to client. The client could resize the monitors sending a message > to the agent through the server, server was not involved. As there > was only a QXL device and no other devices there was no reason to send > information for channel/qxl device and monitor IDs matched Xrandr IDs; > - for some reasons on Windows was not great, somebody decided to use > multiple QXL devices each with one monitor. As a workaround not having > the channel in the protocol to the agent the channel_id+monitor_id formula > was introduced to create an ID fitting into the existing one (for the > agent); > - somebody realized that on Linux was possible to handle resolution > changes talking directly to the device making possible resize without > using the agent. This was implemented with a change in device modes > (rom/ram) and an interrupt to the card handled by our Linux kernel > driver. As on Linux there is only one device, spice-server just > "redirects" the message for the agent sent by the client to the QXL > card. This works on Windows as Windows driver does not handle this > interrupt/feature (or it would replicate monitor settings on all > cards!). > Possibly some wrong assumptions, please correct me. > Note: is not clear to me the history of the mouse events, currently > spice-server sends messages to the agent for the mouse. Thanks for the write up. It makes sense to me. If anybody knows this to be wrong, please speak up ;-) > >> previous assumptions and adds more stuff on top of it. And I think your >> suggestion adds even more complexity than necessary. Perhaps not in the >> client, but the "mapping" code, which sounds rather non-trivial, needs >> to go somewhere, presumably the server? >> > > I think too that bending to the old "design" will do stuff more > complex. > >> It is actually my intention to add as little complexity as possible >> with my patches. The fact is, both the current code and the protocol is >> not robust enough and need to be fixed. And even if they were robust > > I would not speak about robustness and fix, from my point of view looks > more a problem of flexibility. Simply we now have new use cases to > support. > >> enough, the protocol would need to be extended either way. On top of >> that, the backwards compatibility turns the solution into a very >> complex one. But if we ever could deprecate and drop the old code >> paths... >> > > I have same sensation, not changing protocol will make thing much > more complicated and possibly limited. > >>>>>> The more pressing issue with this is that a mouse motion event is >>>>>> sent >>>>>> with a display_id 1 from the client, but when it reaches the guest, >>>>>> the >>>>>> correct monitor is ID 0, as it is the only one. >>>>>> >>>>>> The other thing this breaks is monitors_config messages that >>>>>> enable/disable displays and change the resolution. Besides the same >>>>>> "shift" happening here as well, another problem is the information >>>>>> that >>>>>> the monitors belong to different devices (or channels) is erased when >>>>>> they're all put into the single array (under the assumption there is >>>>>> either one channel with multiple monitors or multiple channels with >>>>>> one >>>>>> monitor each). >>>>> >>>>> Correct and so far was a fine solution. >>>> >>>> I respectfully disagree. Doing the channel_id + monitor_id I mentioned >>>> above is asking for trouble, which is exactly what we have now. >>> >>> With the limitation we decided, we avoid the biggest troubles though. > > We avoid to change the protocol but to implement the new use cases > we introduce potentially many assumptions and code complexity. > Not saying that is hard to discuss new code thinking about design > which nobody wrote down and that is bound to old code which nobody > fully remember. > >> >> I take it you were involved with the original implementation then... >> I'm sorry for being so critical, it is not my intention to be mean :) >> >> However, I am not criticising the limitations you chose to impose, but >> the implementation. I find two problems here: >> >> 1. Not keeping the IDs of the monitors_config messages as a >> (channel_id, monitor_id) pair and instead abusing the limitation you >> introduced to change it to a singe ID. I sort of consider it a hard >> rule you always keep your unique IDs intact and always keep them with >> the data. That way they're there when you need them and they work >> (because you didn't mess with them :)). >> > > I think this single ID came from the history I wrote above (that could > be partly wrong). > I personally would add channel_id and monitor_id to the new messages, > this to make possible to send the correct modes to the card in every > possible case. Saving few bytes for these messages (which are not > that frequent) is not worth it. This was my suggestion as well, although I was suggesting putting fields in the existing ID, where one of the fields would be 0 in “compatible mode” and non-zero if we have device_id / monitor_id / channel_id information. At the time, I suggested that the non-zero field would indicate in which boot phase we were (I called that a “generation”), i.e. if it was expected that we were dealing with QXL or some VGPU. But it’s perfectly fine to add explicit fields. As long as all elements in the chain can agree on the mapping, that’s fine with me. > >> 2. The change in the IDs and the actual creation of the monitors_config >> list leaks over the spice-gtk API to the client application (via the >> spice_main_update_display{,_enabled}() function). It is unnecessary, >> gives more opportunity to the client app to break it, and any extension >> of the monitors_config message requires a spice-gtk API change. >> >> You could have imposed the limitation you did (I'm not entirely sure >> what was the reason) and still make the implementation more robust by >> not doing the above... >> >> And for these reasons this patch series is more complicated that it >> could have been (though as I said, the protocol extension, i.e. adding >> the output_id) would still be necessary. >> >> I'm not saying this to pick on you, I'm not even sure how you were >> involved (didn't go digging who wrote the code), but more as an >> explanation and so that we all can learn from it for the future :) >> > > Usually complex code is the result of many passages and possibly > shortcuts, usually involving different people and undocumented > stuff. Picking on people is often unfair. In all fairness, I think Lukas took a triple dose of carefulness to avoid picking on anybody :-) But I agree we don’t want to ignore Marc-André’s input on the topic, nor do we want to hastily discount Lukas’ analysis of the problem. The insight I got from Marc-André was: remapping monitors on the fly might be easier and more user-friendly. The answer I read from Lukas was that he initially thought the same, and changed his mind. I’d like to know more. > >>>>> Either you do multiple monitor over a single channel, or you use >>>>> multiple >>>>> channels, each with one monitor. >>>> >>>> That is a limitation you can introduce, perhaps there was a reason for >>>> it. But the code is taking shortcuts and borderline hacks because of >>>> this, while doing it right wouldn't be much harder. I'm not looking to >>>> blame or judge anyone, I don't know how this came into place. But what >>>> it is now is (IMO) a bad design and I feel the need to say it after >>>> working on it for quite some time ;) >>> >>> Doing it differently would lead to push the complexity higher up the >>> stack, perhaps even to the user. We wanted to avoid that. > > I don't see how this could reach the user. I think your response focuses on the protocol itself. But we know of user issues related to this problem. One of them is client-side window management. From a user’s perspective, I think that having a single window with a content that changes form text to graphical as we boot is preferable to having a second window that pops up when streaming starts. But even on that simple “user POV” point, there is a debate. I seem to recall that David, for example, considers that a text-only window is a feature. Still from a user’s perspective, the real issue we are trying to solve is that the mouse cursor position is bad, even in a single-monitor configuration, as soon as streaming device and QXL device have different resolutions, and that there is no obvious way to automatically transfer resolution from VGPU to QXL. That is by far the highest-priority user-visible problem to solve. The next one down the list is to support multi-montor configurations. Yet a couple of user POV observations: - Resizing the guest by resizing the client window is a useful feature, that works well with QXL. How do we want it to behave with a hardware VGPU where they may be resolution constraints? - What about the case where we connect to a VM that is also connected to a real monitor? Can’t pick up any resolution in that case. What about identifying the monitors guest-side (e.g. should we included some EDID information in the messages to show them client-side as was once discussed? Forgive me if it’s in the patches and I missed it) - What about the full-screen cases where the resolution is really imposed client-side? To me, those are all important use cases to take into account in the design. At the moment, I am not sure how well the proposed design addresses these issues. For example, I would add fields in new monitor config (or maybe flags shared between old and new) to share restrictions about window resizing (e.g. “resolution imposed by client”, “resolution imposed by guest”, etc) > >> >> Yeah, as I said, not sure what the reason was, but IMO you could have >> had that and still make it much more robust. > > I cannot say a car has a bad design because it does not have the load > of a big truck, just a different use case. We probably started using > the car (to continue the metaphor) to load some stuff and we are now > exceeding the limit. At the beginning we though a car was enough :-) > >> >> Oh and BTW, not sure how old this is, but I'm sure I'd have much worse >> things to say about the code _I_ wrote say like 8 years ago ;) >> >>>>>> Hope this explains it well enough. It's all very complex and goes >>>>>> over >>>>>> multiple interfaces (the network protocols as well as the spice-gtk >>>>>> API), so the more brilliant ideas you'll have, the more welcome they >>>>>> will be :) >>>>> >>>>> If we can avoid modifying all the layers and exposing the inner >>>>> details, I would encourage the exploration of other solution. >>>> >>>> I don't see how we can do that... And the inner details are already >>>> quite exposed, unfortunately :( >>>> >>>> As I said, ideas welcome. >>>> >>>>>>>> 1. The streaming-agent sends a new StreamInfo message when it >>>>>>>> starts >>>>>>>> streaming. The message contains the output_id of the streamed >>>>>>>> monitor. >>>>>>>> The actual value is the index in the list of xrandr outputs. >>>>>>>> Basically, >>>>>>>> the number should uniquely identify a monitor in the guest >>>>>>>> context under >>>>>>>> X. >>>>>>>> >>>>>>>> 2. The server copies the number into the >>>>>>>> SpiceMsgDisplayMonitorsConfig >>>>>>>> message and sends it to the client as part of the monitors_config >>>>>>>> exchange. >>>>>>>> >>>>>>>> 3. The client stores the output_id in it's list of >>>>>>>> monitor_configs. Here >>>>>>>> I had to make significant changes, because the monitors_config >>>>>>>> code in >>>>>>>> spice-gtk is very loosely written and also exposes quite a few >>>>>>>> unnecessary details to the client app. The client sends the >>>>>>>> output_id >>>>>>>> back to the server as part of the monitors_config messages >>>>>>>> (VDAgentMonitorsConfigV2) and also uses it for the mouse motion >>>>>>>> event, >>>>>>>> which also contains a display ID interpreted by the vd_agent. In >>>>>>>> the >>>>>>>> end, the API/ABI towards the client app should remain unchanged. >>>>>>>> >>>>>>>> 4. The server passes the output_id in above-mentioned messages to >>>>>>>> the >>>>>>>> vd_agent. The output_id is meaningless in the server context. >>>>>>>> (Currently, it doesn't pass the monitors_config messages if there >>>>>>>> is a >>>>>>>> QXL device that supports it, though. Needs more work.) >>>>>>>> >>>>>>>> 5. vd_agent: >>>>>>>> a) For mouse input, the output_id was passed in the original >>>>>>>> message, >>>>>>>> so no change needed here, it works. >>>>>>>> >>>>>>>> b) If the server sends monitors_config to the guest, the >>>>>>>> vdagent will >>>>>>>> prefer to use monitors_configs with the output_ids set, if such >>>>>>>> are >>>>>>>> present. In that case, it ignores monitors_configs with the >>>>>>>> output_id >>>>>>>> unset. If no output_ids are present, it should behave as it >>>>>>>> used to. >>>>>>>> >>>>>>>> A couple of things to note: >>>>>>>> - While I did copy the VDAgentMonitorsConfig to a different >>>>>>>> message for >>>>>>>> backwards compatibility, I didn't do the same for >>>>>>>> SpiceMsgDisplayMonitorsConfig, it remains to be done. >>>>>>>> >>>>>>>> - I didn't introduce any capabilities to handle the >>>>>>>> compatibility, also >>>>>>>> remains to be done. Hopefully it is also clear it will be quite a >>>>>>>> non-trivial job to do that :( Ain't gonna make the code prettier >>>>>>>> either. >>>>>>>> >>>>>>>> For your convenience, you can also pull the branches below: >>>>>>>> https://gitlab.freedesktop.org/lukash/spice-protocol/tree/monitors-config-poc >>>>>>>> https://gitlab.freedesktop.org/lukash/spice-common/tree/monitors-config-poc >>>>>>>> https://gitlab.com/lhrazky/spice-streaming-agent/tree/monitors-config-poc >>>>>>>> https://gitlab.freedesktop.org/lukash/spice/tree/monitors-config-poc >>>>>>>> https://gitlab.freedesktop.org/lukash/spice-gtk/tree/monitors-config-poc >>>>>>>> https://gitlab.freedesktop.org/lukash/vd_agent/tree/monitors-config-poc >>>>>>>> >>>>>>>> All in all, it's big, complex and invasive. Note I did review the >>>>>>>> emergency instructional video [1] and am therefore ready for any >>>>>>>> bombardment you'll be sending my way :D (Don't hesitate to >>>>>>>> contact me >>>>>>>> with questions either) >>>>>>>> >>>>>>>> Last minute note: I've noticed some of the patches are missing >>>>>>>> Signed-off-by line, since they are not for merging, should not be >>>>>>>> an >>>>>>>> issue... >>>>>>>> >>>>>>>> >>>>>>>> Lukáš Hrázký (16): >>>>>>>> spice-protocol >>>>>>>> Add the StreamInfo message >>>>>>>> Create a version 2 of the VDAgentMonitorsConfig message >>>>>>>> spice-common >>>>>>>> add output_id to SpiceMsgDisplayMonitorsConfig >>>>>>>> spice-streaming-agent >>>>>>>> Send a StreamInfo message when starting to stream >>>>>>>> spice-server >>>>>>>> Handle the StreamInfo message from the streaming agent >>>>>>>> Use VDAgentMonitorsConfigV2 that contains the output_id field >>>>>>>> spice-gtk >>>>>>>> Rework the handling of monitors_config >>>>>>>> Remove the n_display_channels check when sending >>>>>>>> monitors_config >>>>>>>> Use an 'enabled' flag instead of status enum in monitors_config >>>>>>>> Use VDAgentMonitorsConfigV2 as the message for monitors_config >>>>>>>> Add output_id to the monitors_config >>>>>>>> Use the new output_id as display ID for the mouse motion event >>>>>>>> vd_agent >>>>>>>> vdagent: Log the diddly doo X11 error >>>>>>>> Improve/add some logging messages >>>>>>>> Use VDAgentMonitorsConfigV2 instead of VDAgentMonitorsConfig >>>>>>>> vdagent: Use output_id from VDAgentMonitorsConfigV2 >>>>>>>> >>>>>>>> [1] https://www.youtube.com/watch?v=IKqXu-5jw60 >>>>>>>> > > Frediano > _______________________________________________ > Spice-devel mailing list > Spice-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/spice-devel _______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/spice-devel