Re: [RFC] Virtual CRTCs (proposal + experimental code)

Ilija Hadzic <ihadzic@xxxxxxxxxxxxxxxxxxxxxx> · Wed, 23 Nov 2011 23:59:05 -0600 (CST)

On Wed, 23 Nov 2011, Dave Airlie wrote:

So another question I have is how you would intend this to work from a
user POV, like how it would integrate with a desktop environment, X or
wayland, i.e. with little or no configuration.

First thing to understand is that when a virtual CRTC is created, it looks 
to the user like the GPU has an additional DisplayPort connector.
At present I "abuse" DisplayPort, but I have seen that you pushed a patch 
from VMware that adds Virtual connector, so eventually I'll switch to that 
naming. The number of virtual CRTCs is determined when the driver loads 
and that is a static configuration parameter. This does not restrict the 
user because unused virutal CRTCs are just like disconnected connectors on 
the GPU. In extreme case, a user could max out the number of virtual CRTCs 
(i.e. 32 minus #-of-physical-CRTCs), but in general the system needs to be 
booted with maximum number of anticipated CRTCs. Run-time addition and 
removal of CRTCs is not supported at this time and that would be much 
harder to implement and would affect the whole DRM module everywhere.

So now we have a system that booted up and DRM sees all of its real 
connectors as well as virtual ones (as DisplayPorts at present). If there 
is no CTD device attached to virtual CRTCs, these virtual connectors are 
disconnected as far as DRM is concerned. Now the userspace must call 
"attach/fps" ioctl to associate CTDs with CRTCs. I'll explain shortly how 
to automate that and how to eliminate the burden from the user, but for 
now, please assume that "attach/fps" gets called from userland somehow.

When the attach happens, that is a hotplug event (VCRTCM generates it) to 
DRM, just like someone plugged in the monitor. Then when XOrg starts, it
will use the DisplayPort that represents a virtual CRTC just like any 
other connector. How it will use it, will depend on what the xorg.conf 
says, but the key point is that this connector is no different from any 
other connector that the GPU provides and is thus used as an "equal 
citizen". No special configuration is necessary once attached to CTD.

If CTD is detached and new CTD attached, that is just like yanking out 
monitor cable and plugging in the new one. DRM will get all hotplug events 
and windowing system will do the same thing it would normally do with any 
other port. If RANDR is called to resize the desktop it will also work and 
X will have no idea that one of the connectors is on a virtual CRTC. I 
also have another feature, where when CTD is attached, it can ask the 
device it drives for the connection status and propagate that all the way 
back to DRM (this is useful for CTD devices that drive real monitors, like 
DisplayLink).

So now let's go back to the attach/fps ioctl. To attach a CTD device this 
ioctl must happen as a result of some policy. That can be done by having 
CTD device generate UDEV events when it loads for which one can write 
policies to determine which CTD device attaches to which virtual CRTC.
Ultimately that becomes an user configuration, but it's no different from 
what one has to do now with UDEV policies to customize the system.

Having explained this, let's take your hotplug example that you put up on 
your web page and redo it with virtual CRTCs. Here is how it would work: 
Boot up the system and tell the GPU to create a few virtual CRTCs. Bring 
up Xorg with no DisplayLink dongles in it. Now plug in the DisplayLink. 
Once the CTD driver loads as the result of the hotplug (right now UDLCTD 
is a separate driver, but as we discussed before, this is a temporary 
state and at some point its CTD function should be merged wither with 
UDLFB or with your UDL-V2) CTD function in the driver generates an UDEV 
event. The policy directs UDEV to call run the program that issues the 
ioctl to perform attach/fps. Attach/fps of the UDLCTD is now a hotplug 
event and DRM "thinks" that a new connector changed the status from 
disconnected to connected. That causes it to query the modes for the new 
connector and because it's the virtual CRTC, it lands in the virtual CRTC 
helpers in the GPU driver. Virtual CRTC helpers route it to VCRTCM, which 
further routes to it to CTD (UDLCTD in this case). CTD returns the modes 
and DRM gets them ... the rest you know better than me what happens ;-)

So this is your hotplug demo, but the difference is that the new desktop 
can use direct rendering. Also, everything that would work for a normal 
connector works here without having to do any additional tricks. RANDR 
also works seamlessly without having to do anything special. If you move 
away from Xorg, to some other system (Wayland?), it still works for as 
long as the new system knows how to deal with connectors that connect and 
disconnect.

Everything I described above is ready to go except the UDEV event from 
UDLCTD and UDEV rules to automate this. Both are straightforwar and won't 
take long to do. So very shortly, I'll be able to show the hotplug-bis.

From what you wrote in your blog, it sounds like this is exactly what you 
are looking for. I recognize that it disrupts your current views/plans on 
how this should be done, but I do want to work with you to find a suitable 
middle ground that covers most of the possiblities.

In case you are looking at my code to follow the above-described 
scenarios, please make sure you pull the latest stuff from my github 
repository. I have been pushing new stuff since my original annoucement.

I still foresee problems with tiling, we generally don't encourage
accel code to live in the kernel, and you'll really want a
tiled->untiled blit for this thing,

Accel code should not go into the kernel (that I fully agree) and there is 
nothing here that would behove us to do so. Restricting my comments to 
Radeon GPU (which is the only one that I know well enough), shaders for 
blit copy live in the kernel and irrespective of VCRTCM work. I rely on 
them to move the frame buffer out of VRAM to CTD device but I don't add 
any additional features.

Now for detiling, I think that it should be the responsibility of the 
receiving CTD device, not the GPU pushing the data (Alan mentioned that 
during the initial set of comments, and although I didn't say anything to 
it that has been my view as well).

Even if you wanted to use GPU for detiling (which I'll explain shortly why 
you should not), it would not require any new accel code in the kernel. It 
would merely require one bit flip in the setup of blit copy that already 
lives in the kernel.

However, de-tiling in GPU is a bad idea for two reasons. I tried to do 
that just as an experiment on Radeon GPUs and watched with the PCI Express 
analyzer what happens on the bus (yeah, I have some "heavy weapons" in my 
lab). Normally a tile is a continuous array of memory locations in VRAM. 
If blit-copy function is told to assume tiled source and linear 
destination (de-tiling) it will read a continuous set of addresses in 
VRAM, but then scatter 8 rows of 8 pixels each on non-contignuous set of 
addresses of the destination. If the destination is the PCI-Express bus, 
it will result in 8 32-byte write transactions instead of 2 128-byte 
transactions per each tile. That will choke the throughput of the bus 
right there.

BTW, this is the crux of the blit-copy performance improvement that you 
got from me back in October. Since blit-copy deals with copying a linear 
array, playing with tiled/non-tiled bits only affects the order in which 
addresses are accessed, so the trick was to get rid of short PCIe 
transactions and also shape up linear to rectangle mapping to make address 
pattern more friendly for the host.

also for Intel GPUs where you have
UMA, would you read from the UMA.

Yes the read would be from UMA. I have not yet looked at Intel GPUs in 
detail, so I don't have an answer for you on what problems would pop up 
and how to solve them, but I'll be glad to revisit the Intel discussion 
once I do some homework.

Some initial thoughts is that frame buffer in Intel are at the end of the 
day pages in the system memory, so anyone/anything can get to them if they 
are correctly mapped.

It also doesn't solve the optimus GPU problem in any useful fashion,
since it can't deal with all the use cases, so we still have to write
an alternate solution that can deal with them, so we just end up with
two answers.

Can you elaborate on some specific use cases that are of your concern? I 
have had this case in mind and I think I can make it work. First I would 
have to add CTD functionality to Intel driver. That should be 
straightforward. Once I get there, I'll be ready to experiment and we'll 
probably be in better position to discuss the specifics then (i.e. when we 
have something working to compare with what you did in PRIME experiemnt), 
but it would be good to know your specific concerns early.

thanks,

Ilija

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel