[RFC] Using DC in amdgpu for upcoming GPU

Tony.Cheng@xxxxxxx (Cheng, Tony) · Fri, 9 Dec 2016 17:56:01 +0000

> Merging this code as well as maintaining a trust relationship with 
> Linus, also maintains a trust relationship with the Linux graphics 
> community and other drm contributors. There have been countless 
> requests from various companies and contributors to merge unsavoury 
> things over the years and we've denied them. They've all had the same 
> reasons behind why they couldn't do what we want and why we were 
> wrong, but lots of people have shown up who do get what we are at and 
> have joined the community and contributed drivers that conform to the standards.
> Turning around now and saying well AMD ignored our directions, so 
> we'll give them a free pass even though we've denied you all the same 
> thing over time.

I'd like to say that I acknowledge the good and hard work maintainers are doing.  You nor the community is wrong to say no. I understand where the no comes from.  If somebody wants to throw 100k lines into DAL I would say no as well.

> If I'd given in and merged every vendor coded driver as-is we'd never 
> have progressed to having atomic modesetting, there would have been 
> too many vendor HALs and abstractions that would have blocked forward 
> progression. Merging one HAL or abstraction is going to cause  pain, 
> but setting a precedent to merge more would be just downright stupid 
> maintainership.

> Here's the thing, we want AMD to join the graphics community not hang 
> out inside the company in silos. We need to enable FreeSync on Linux, 
> go ask the community how would be best to do it, don't shove it inside 
> the driver hidden in a special ioctl. Got some new HDMI features that 
> are secret, talk to other ppl in the same position and work out a plan 
> for moving forward. At the moment there is no engaging with the Linux 
> stack because you aren't really using it, as long as you hide behind 
> the abstraction there won't be much engagement, and neither side 
> benefits, so why should we merge the code if nobody benefits?

> The platform problem/Windows mindset is scary and makes a lot of 
> decisions for you, open source doesn't have those restrictions, and I 
> don't accept drivers that try and push those development model 
> problems into our codebase.

I would like to share how platform problem/Windows mindset look from our side.  We are dealing with ever more complex hardware with the push to reduce power while driving more pixels through.  It is the power reduction that is causing us driver developers most of the pain.  Display is a high bandwidth real time memory fetch sub system which is always on, even when the system is idle.  When the system is idle, pretty much all of power consumption comes from display.  Can we use existing DRM infrastructure?  Definitely yes, if we talk about modes up to 300Mpix/s and leaving a lot of voltage and clock margin on the table.  How hard is it to set up a timing while bypass most of the pixel processing pipeline to light up a display?  How about adding all the power optimization such as burst read to fill display cache and keep DRAM in self-refresh as much as possible?  How about powering off some of the cache or pixel processing pipeline if we are not using them?  We need to manage and maximize valuable resources like cache (cache == silicon area == $$) and clock (== power) and optimize memory request patterns at different memory clock speeds, while DPM is going, in real time on the system.  This is why there is so much code to program registers, track our states, and manages resources, and it's getting more complex as HW would prefer SW program the same value into 5 different registers in different sub blocks to save a few cross tile wires on silicon and do complex calculations to find the magical optimal settings (the hated bandwidth_cals.c).  There are a lot of registers need to be programmed to correct values in the right situation if we enable all these power/performance optimizations.

It's really not a problem of windows mindset, rather is what is the bring up platform when silicon is in the lab with HW designer support.  Today no surprise we do that almost exclusively on windows.  Display team is working hard to change that to have linux in the mix while we have the attention from HW designers.  We have a recent effort to try to enable all power features on Stoney (current gen low power APU) to match idle power on windows after Stoney shipped.  Linux driver guys working hard on it for 4+ month and still having hard time getting over the hurdle without support from HW designers because designers are tied up with the next generation silicon currently in the lab and the rest of them already moved onto next next generation.  To me I would rather have everything built on top of DC, including HW diagnostic test suites.  Even if I have to build DC on top of DRM mode setting I would prefer that over trying to do another bring up without HW support.  After all as driver developer refactoring and changing code is more fun than digging through documents/email and experimenting with different combination of settings in register and countless of reboots to try get pass some random hang.  

FYI, just dce_mem_input.c programs over 50 distinct register fields, and DC for current generation ASIC doesn't yet support all features and power optimizations.  This doesn't even include more complex programming model in future generation with HW IP getting more modular.  We are already making progress with bring up with shared DC code for next gen ASIC in the lab. DC HW programming / resource management / power optimization will be fully validated on all platforms including Linux and that will benefit the Linux driver running on AMD HW, especially in battery life.

Just in case you are wondering Polaris windows driver isn't using DC and was on a "windows architecture" code base.  We understand that from community point of view you are not getting much feature / power benefit yet because CI/VI/CZ/Polaris Linux driver with DC is only used in Linux and we donâ??t have the man power to make it fully optimized yet.  Next gen will be performance and power optimized at launch.  I acknowledge that we don't have full feature on Linux yet and we still need to work with community to amend DRM to enable FreeSync, HDR, next gen resolution and other display feature just made available in Crimson ReLive.  However it's not realistic to engage with community early on in these efforts, as up to 1 month prior to release we were still experimenting with different solutions to make the feature better and we wouldn't have known what we end up building half year ago.  And of course marketing wouldn't let us leak these features before Crimson launch.

I would like to work with the community and I think we have shown that we welcome, appreciate and take feedback seriously.  There is plenty of work done in DC addressing some of the easier to fix problems while we have next gen ASIC in the lab as top priority.  We are already down to 66k lines of code from 93k through refactoring and remove numerous abstractions.  We can't just tear apart the "mid layer" or "HAL" over night.  Plenty of work need to be done to understand if/how we can fit resource optimization complexity into existing DRM framework.  If you look at DC structure closely, we created them to plug into DRM structures (ie.  dc_surface == FB/plane, dc_stream ~= CRTC, dc_link+dc_sink = encoder + connector), but we need a resource layer to decide how to realize the given "state" with our HW.  The problem is not getting simpler as on top of multi-plane combine, shared encoders and clock resources,  compression is starting to get into display domain.  By the way, existing DRM structure do fit nicely for HW of 4 generations ago, and with current windows driver we do have concept of crtc, encoders, connector. However over the years complexity has grown and resource management is becoming a problem, which led us to design of putting in a resource management layer.  We might not be supporting full range of what atomic can do and our semantics may be different at this stage of development, but saying dc_validate breaks atomic only tells me you haven't take a close look at our DC code.  For us all validation runs same topology/resource algorithm in check and commit.  It's not optimal yet as we will end up doing this algorithm twice today on a commit but we do intend to fix it over time.  I welcome any concrete suggestions on using existing framework to solve the resource/topology management issue.  It's not too late to change DC now but after couple year after more OS and ASICs are built on top of DC it will be very difficult to change. 

> Now the reason I bring this up (and we've discussed it at length in
> private) is that DC still suffers from a massive abstraction midlayer. 
> A lot of the back-end stuff (dp aux, i2c, abstractions for allocation, 
> timers, irq, ...) have been cleaned up, but the midlayer is still there.
> And I understand why you have it, and why it's there - without some OS 
> abstraction your grand plan of a unified driver across everything 
> doesn't work out so well.
>
> But in a way the backend stuff isn't such a big deal. It's annoying 
> since lots of code, and bugfixes have to be duplicated and all that, 
> but it's fairly easy to fix case-by-case, and as long as AMD folks 
> stick around (which I fully expect) not a maintainance issue. It makes 
> it harder for others to contribute, but then since it's mostly the 
> leaf it's generally easy to just improve the part you want to change 
> (as an outsider). And if you want to improve shared code the only 
> downside is that you can't also improve amd, but that's not so much a 
> problem for non-amd folks ;-)

Unfortunately duplicating bug fixes is not trivial and if code base diverge some of the fixes will be different.  Surprisingly if you track where we spend our time, < 20% is writing code.  Probably 50% is trying to figure out which register need a different value programmed in those situations. The other 30% is trying to make sure the change doesnâ??t break other stuff in different scenarios.  If power and performance optimizations remains off in Linux then I would agree with your assessment.  

> I've only got one true power as a maintainer, and that is to say No.

We AMD driver developer only got 2 true power over community, and that is having access to internal documentation and HW designers.  Not pulling Linux into the mix while silicon is still in the lab means we lose half of our power (HW designer support).   

> I've also wondered if the DC code is ready for being part of the 
> kernel anyways, what happens if I merge this, and some external 
> contributor rewrites 50% of it and removes a bunch of stuff that the 
> kernel doesn't need. By any kernel standards I'll merge that sort of 
> change over your heads if Alex doesn't, it might mean you have to 
> rewrite a chunk of your internal validation code, or some other 
> interactions, but those won't be reasons to block the changes from my 
> POV. I'd like some serious introspection on your team's part on how 
> you got into this situation and how even if I was feeling like merging 
> this (which I'm not) how you'd actually deal with being part of the 
> Linux kernel and not hiding in nicely framed orgchart silo behind a 
> HAL.

We have come a long way compare to how we used to be windows centric, and I am sure there is plenty of work remaining for us to be ready to be part of the kernel.  If community has clever and clean solution that doesnâ??t break our ASICs weâ??ll take it internally with open arms.  We merged Dave and Jeromeâ??s clean up on removing abstractions and we had lots of patches following Dave and Jeromeâ??s lead in different area.  

Again this is not about orgchart.  Itâ??s about whatâ??s validated when samples are in the lab.

God I miss the day when everything is plugged into the wall and dual link DVI was cutting edge.  At least most of our problem can be solved by diffing register dump between good and bad case.  

Tony