Re: [RFC] Virtual CRTCs (proposal + experimental code)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Wed, 23 Nov 2011, Dave Airlie wrote:

So another question I have is how you would intend this to work from a
user POV, like how it would integrate with a desktop environment, X or
wayland, i.e. with little or no configuration.


First thing to understand is that when a virtual CRTC is created, it looks to the user like the GPU has an additional DisplayPort connector. At present I "abuse" DisplayPort, but I have seen that you pushed a patch from VMware that adds Virtual connector, so eventually I'll switch to that naming. The number of virtual CRTCs is determined when the driver loads and that is a static configuration parameter. This does not restrict the user because unused virutal CRTCs are just like disconnected connectors on the GPU. In extreme case, a user could max out the number of virtual CRTCs (i.e. 32 minus #-of-physical-CRTCs), but in general the system needs to be booted with maximum number of anticipated CRTCs. Run-time addition and removal of CRTCs is not supported at this time and that would be much harder to implement and would affect the whole DRM module everywhere.

So now we have a system that booted up and DRM sees all of its real connectors as well as virtual ones (as DisplayPorts at present). If there is no CTD device attached to virtual CRTCs, these virtual connectors are disconnected as far as DRM is concerned. Now the userspace must call "attach/fps" ioctl to associate CTDs with CRTCs. I'll explain shortly how to automate that and how to eliminate the burden from the user, but for now, please assume that "attach/fps" gets called from userland somehow.

When the attach happens, that is a hotplug event (VCRTCM generates it) to DRM, just like someone plugged in the monitor. Then when XOrg starts, it will use the DisplayPort that represents a virtual CRTC just like any other connector. How it will use it, will depend on what the xorg.conf says, but the key point is that this connector is no different from any other connector that the GPU provides and is thus used as an "equal citizen". No special configuration is necessary once attached to CTD.

If CTD is detached and new CTD attached, that is just like yanking out monitor cable and plugging in the new one. DRM will get all hotplug events and windowing system will do the same thing it would normally do with any other port. If RANDR is called to resize the desktop it will also work and X will have no idea that one of the connectors is on a virtual CRTC. I also have another feature, where when CTD is attached, it can ask the device it drives for the connection status and propagate that all the way back to DRM (this is useful for CTD devices that drive real monitors, like DisplayLink).

So now let's go back to the attach/fps ioctl. To attach a CTD device this ioctl must happen as a result of some policy. That can be done by having CTD device generate UDEV events when it loads for which one can write policies to determine which CTD device attaches to which virtual CRTC. Ultimately that becomes an user configuration, but it's no different from what one has to do now with UDEV policies to customize the system.

Having explained this, let's take your hotplug example that you put up on your web page and redo it with virtual CRTCs. Here is how it would work: Boot up the system and tell the GPU to create a few virtual CRTCs. Bring up Xorg with no DisplayLink dongles in it. Now plug in the DisplayLink. Once the CTD driver loads as the result of the hotplug (right now UDLCTD is a separate driver, but as we discussed before, this is a temporary state and at some point its CTD function should be merged wither with UDLFB or with your UDL-V2) CTD function in the driver generates an UDEV event. The policy directs UDEV to call run the program that issues the ioctl to perform attach/fps. Attach/fps of the UDLCTD is now a hotplug event and DRM "thinks" that a new connector changed the status from disconnected to connected. That causes it to query the modes for the new connector and because it's the virtual CRTC, it lands in the virtual CRTC helpers in the GPU driver. Virtual CRTC helpers route it to VCRTCM, which further routes to it to CTD (UDLCTD in this case). CTD returns the modes and DRM gets them ... the rest you know better than me what happens ;-)

So this is your hotplug demo, but the difference is that the new desktop can use direct rendering. Also, everything that would work for a normal connector works here without having to do any additional tricks. RANDR also works seamlessly without having to do anything special. If you move away from Xorg, to some other system (Wayland?), it still works for as long as the new system knows how to deal with connectors that connect and disconnect.

Everything I described above is ready to go except the UDEV event from UDLCTD and UDEV rules to automate this. Both are straightforwar and won't take long to do. So very shortly, I'll be able to show the hotplug-bis.

From what you wrote in your blog, it sounds like this is exactly what you
are looking for. I recognize that it disrupts your current views/plans on how this should be done, but I do want to work with you to find a suitable middle ground that covers most of the possiblities.

In case you are looking at my code to follow the above-described scenarios, please make sure you pull the latest stuff from my github repository. I have been pushing new stuff since my original annoucement.


I still foresee problems with tiling, we generally don't encourage
accel code to live in the kernel, and you'll really want a
tiled->untiled blit for this thing,

Accel code should not go into the kernel (that I fully agree) and there is nothing here that would behove us to do so. Restricting my comments to Radeon GPU (which is the only one that I know well enough), shaders for blit copy live in the kernel and irrespective of VCRTCM work. I rely on them to move the frame buffer out of VRAM to CTD device but I don't add any additional features.

Now for detiling, I think that it should be the responsibility of the receiving CTD device, not the GPU pushing the data (Alan mentioned that during the initial set of comments, and although I didn't say anything to it that has been my view as well).

Even if you wanted to use GPU for detiling (which I'll explain shortly why you should not), it would not require any new accel code in the kernel. It would merely require one bit flip in the setup of blit copy that already lives in the kernel.

However, de-tiling in GPU is a bad idea for two reasons. I tried to do that just as an experiment on Radeon GPUs and watched with the PCI Express analyzer what happens on the bus (yeah, I have some "heavy weapons" in my lab). Normally a tile is a continuous array of memory locations in VRAM. If blit-copy function is told to assume tiled source and linear destination (de-tiling) it will read a continuous set of addresses in VRAM, but then scatter 8 rows of 8 pixels each on non-contignuous set of addresses of the destination. If the destination is the PCI-Express bus, it will result in 8 32-byte write transactions instead of 2 128-byte transactions per each tile. That will choke the throughput of the bus right there.

BTW, this is the crux of the blit-copy performance improvement that you got from me back in October. Since blit-copy deals with copying a linear array, playing with tiled/non-tiled bits only affects the order in which addresses are accessed, so the trick was to get rid of short PCIe transactions and also shape up linear to rectangle mapping to make address pattern more friendly for the host.


also for Intel GPUs where you have
UMA, would you read from the UMA.


Yes the read would be from UMA. I have not yet looked at Intel GPUs in detail, so I don't have an answer for you on what problems would pop up and how to solve them, but I'll be glad to revisit the Intel discussion once I do some homework.

Some initial thoughts is that frame buffer in Intel are at the end of the day pages in the system memory, so anyone/anything can get to them if they are correctly mapped.


It also doesn't solve the optimus GPU problem in any useful fashion,
since it can't deal with all the use cases, so we still have to write
an alternate solution that can deal with them, so we just end up with
two answers.


Can you elaborate on some specific use cases that are of your concern? I have had this case in mind and I think I can make it work. First I would have to add CTD functionality to Intel driver. That should be straightforward. Once I get there, I'll be ready to experiment and we'll probably be in better position to discuss the specifics then (i.e. when we have something working to compare with what you did in PRIME experiemnt), but it would be good to know your specific concerns early.


thanks,

Ilija

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux