[RFC] Using DC in amdgpu for upcoming GPU

tony.cheng@xxxxxxx (Cheng, Tony) · Mon, 12 Dec 2016 21:52:08 -0500

On 12/11/2016 9:57 PM, Dave Airlie wrote:
> On 8 December 2016 at 12:02, Harry Wentland<harry.wentland at amd.com>  wrote:
>> We propose to use the Display Core (DC) driver for display support on
>> AMD's upcoming GPU (referred to by uGPU in the rest of the doc). In order to
>> avoid a flag day the plan is to only support uGPU initially and transition
>> to older ASICs gradually.
> [FAQ: from past few days]
>
> 1) Hey you replied to Daniel, you never addressed the points of the RFC!
> I've read it being said that I hadn't addressed the RFC, and you know
> I've realised I actually had, because the RFC is great but it
> presupposes the codebase as designed can get upstream eventually, and
> I don't think it can. The code is too littered with midlayering and
> other problems, that actually addressing the individual points of the
> RFC would be missing the main point I'm trying to make.
>
> This code needs rewriting, not cleaning, not polishing, it needs to be
> split into its constituent parts, and reintegrated in a form more
> Linux process friendly.
>
> I feel that if I reply to the individual points Harry has raised in
> this RFC, that it means the code would then be suitable for merging,
> which it still won't, and I don't want people wasting another 6
> months.
>
> If DC was ready for the next-gen GPU it would be ready for the current
> GPU, it's not the specific ASIC code that is the problem, it's the
> huge midlayer sitting in the middle.

We would love to upstream DC for all supported asic!  We made enough 
change to make Sea Island work but it's really not validate to the 
extend we validate Polaris on linux and no where close to what we do for 
2017 ASICs.  With DC the display hardware programming, resource 
optimization, power management and interaction with rest of system will 
be fully validated across multiple OSs.  Therefore we have high 
confidence that the quality is going to better than what we have 
upstreammed today.

I don't have a baseline to say if DC is in good enough quality for older 
generation compare to upstream.  For example we don't have HW generate 
bandwidth_calc for DCE 8/10 (Sea/Vocanic island family) but our code is 
structured in a way that we assume bandwidth_calc is there.  None of us 
feel like go untangle the formulas in windows driver at this point to 
create our own version of bandwidth_calc. It sort of work with HW 
default values but some mode / config is likely to underflows.  If 
community is okay with uncertain quality, sure we would love to upstream 
everything to reduce our maintaince overhead.  You do get audio with DC 
on DCE8 though.

> 2) We really need to share all of this code between OSes, why does
> Linux not want it?
>
> Sharing code is a laudable goal and I appreciate the resourcing
> constraints that led us to the point at which we find ourselves, but
> the way forward involves finding resources to upstream this code,
> dedicated people (even one person) who can spend time on a day by day
> basis talking to people in the open and working upstream, improving
> other pieces of the drm as they go, reading atomic patches and
> reviewing them, and can incrementally build the DC experience on top
> of the Linux kernel infrastructure. Then having the corresponding
> changes in the DC codebase happen internally to correspond to how the
> kernel code ends up looking. Lots of this code overlaps with stuff the
> drm already does, lots of is stuff the drm should be doing, so patches
> to the drm should be sent instead.

Maybe let me share what we are doing and see if we can come up with 
something to make DC work for both upstream and our internal need. We 
are sharing code not just on Linux and we will do our best to make our 
code upstream friendly.  Last year we focussed on having enough code to 
prove that our DAL rewrite works and get more people contributing to 
it.  We rush a bit as a result we had a few legacy component we port 
from Windows driver and we know it's bloat that needed to go.

We designed DC so HW can contribute bandwidth_calc magic and psuedo code 
to program the HW blocks.  The HW blocks on the bottom of DC.JPG in 
models our HW blocks and the programming sequence are provided by HW 
engineers.  If a piece of HW need a bit toggled 7 times during power up 
I rather have HW engineer put that in their psedo code rather than me 
trying to find that sequence in some document.  Afterall they did 
simulate the HW with the toggle sequence.  I guess these are back-end 
code Daniel talked about.  Can we agree that DRM core is not interested 
in how things are done in that layer and we can upstream these as it?

The next is dce_hwseq.c to program the HW blocks in correct sequence.  
Some HW block can be programmed in any sequence, but some requires 
strict sequence to be followed.  For example Display CLK and PHY CLK 
need to be up before we enable timing generator.  I would like these 
sequence to remain in DC as it's really not DRM's business to know how 
to program the HW.  In a way you can consider hwseq as a helper to 
commit state to HW.

Above hwseq is the dce*_resource.c.  It's job is to come up with the HW 
state required to realize given config.  For example we would use the 
exact same HW resources with same optimization setting to drive any same 
given config.  If 4 x 4k at 60 is supported with resource setting A on HW 
diagnositc suite during bring up setting B on Linux then we have a 
problem.  It know which HW block work with which block and their 
capability and limitations.  I hope you are not asking this stuff to 
move up to core because in reality we should probably hide this in some 
FW, as HW expose the register to config them differently that doesn't 
mean all combination of HW usage is validated.   To me resource is more 
of a helper to put together functional pipeline and does not make any 
decision that any OS might be interested in.

These yellow boxes in DC.JPG are really specific to each generation of 
HW and changes frequently.  These are things that HW has consider hiding 
it in FW before.  Can we agree on those code (under /dc/dce*) can stay?

DAL3.JPG shows how we put this all together.  The core part is design to 
behave like helper, except we try to limit the entry point and opted for 
caller to build desire state we want DC to commit to. It didn't make 
sense for us to expose hundred of function (our windows dal interface 
did) and require caller to call these helpers in correct sequence.  
Caller builds absolute state it want to get to and DC will make it 
happen with the HW available.

> 3) Then how do we upstream it?
> Resource(s) need(s) to start concentrating at splitting this thing up
> and using portions of it in the upstream kernel. We don't land fully
> formed code in the kernel if we can avoid it. Because you can't review
> the ideas and structure as easy as when someone builds up code in
> chunks and actually develops in the Linux kernel. This has always
> produced better more maintainable code. Maybe the result will end up
> improving the AMD codebase as well.
Is this about demonstration how basic functionality work and add more 
features with series of patches to make review eaiser?  If so I don't 
think we are staff to do this kind of rewrite.  For example it make no 
sense to hooking up bandwidth_calc to calculate HW magic if we don't 
have mem_input to program the memory settings.  We need portion of 
hw_seq to ensure these blocks are programming in correct sequence.  We 
will need to feed bandwidth_calc it's required inputs, which is 
basically the whole system state tracked in validate_context today, 
which means we basically need big bulk of resource.c.  This effort might 
have benefit in reviewing the code, but we will end up with pretty much 
similar if not the same as what we already have.

Or is the objection that we have the white boxes in DC.JPG instead of 
using DRM objects?  We can probably workout something to have the white 
boxes derive from DRM objects and extend atomic state with our 
validate_context where dce*_resource.c stores the constructed pipelines.

> 4) Why can't we put this in staging?
> People have also mentioned staging, Daniel has called it a dead end,
> I'd have considered staging for this code base, and I still might.
> However staging has rules, and the main one is code in staging needs a
> TODO list, and agreed criteria for exiting staging, I don't think we'd
> be able to get an agreement on what the TODO list should contain and
> how we'd ever get all things on it done. If this code ended up in
> staging, it would most likely require someone dedicated to recreating
> it in the mainline driver in an incremental fashion, and I don't see
> that resource being available.
>
> 5) Why is a midlayer bad?
> I'm not going to go into specifics on the DC midlayer, but we abhor
> midlayers for a fair few reasons. The main reason I find causes the
> most issues is locking. When you have breaks in code flow between
> multiple layers, but having layers calling back into previous layers
> it becomes near impossible to track who owns the locking and what the
> current locking state is.
>
> Consider
>      drma -> dca -> dcb -> drmb
>      drmc -> dcc  -> dcb -> drmb
>
> We have two codes paths that go back into drmb, now maybe drma has a
> lock taken, but drmc doesn't, but we've no indication when we hit drmb
> of what the context pre entering the DC layer is. This causes all
> kinds of problems. The main requirement is the driver maintains the
> execution flow as much as possible. The only callback behaviour should
> be from an irq or workqueue type situations where you've handed
> execution flow to the hardware to do something and it is getting back
> to you. The pattern we use to get our of this sort of hole is helper
> libraries, we structure code as much as possible as leaf nodes that
> don't call back into the parents if we can avoid it (we don't always
> succeed).
Okay.  by the way DC does behave like a helper for most part.  There is 
no locking in DC.  We work enough with different OS to know they all 
have different synchronization primatives and interrupt handling and 
have DC lock anything is just shooting ourself in the foot.  We do have 
function with lock in their function name in DC but those are HW 
register lock to ensure that the HW register update atomically. ie have 
50 register write latch in HW at next vsync to ensure HW state change on 
vsync boundary.

> So the above might becomes
>     drma-> dca_helper
>             -> dcb_helper
>             -> drmb.
>
> In this case the code flow is controlled by drma, dca/dcb might be
> modifying data or setting hw state but when we get to drmb it's easy
> to see what data is needs and what locking.
>
> DAL/DC goes against this in so many ways, and when I look at the code
> I'm never sure where to even start pulling the thread to unravel it.
I don't know where we go against it.  In the case we do callback to DRM 
for MST case we have

amdgpu_dm_atomic_commit (implement atomic_commit)
dc_commit_targets (commit helper)
dce110_apply_ctx_to_hw (hw_seq)
core_link_enable_stream (part of MST enable sequence)
allocate_mst_payload (helper for above func in same file)
dm_helpers_dp_mst_write_payload_allocation_table (glue code to call DRM)
drm_dp_mst_allocate_vcpi (DRM)

As you see even in this case we are only 6 level deep before we callback 
to DRM, and 2 of those functions are in same file as helper func of the 
bigger sequence.

Can you clarify the distinction between what you would call a mid layer 
vs helper.  We consulted Alex a lot and we know about this inversion of 
control pattern and we are trying our best to do it. Is it the way 
functions are named and files folder structure?  Would it help if we 
flatten amdgpu_dm_atomic_commit and dc_commit_targets?  Even if we do I 
would imagine we want some helper in commit rather a giant 1000 line 
function.  Is there any concern that we put dc_commit_targets under /dc 
folder as we want other platform to run exact same helper?  Or this is 
about the state dc_commit_targets is too big?  or the state is stored 
validate_context rather than drm_atomic_state?

I don't think it make sense for DRM to get into how we decide to use our 
HW blocks.  For example any refactor done in core should not result in 
us using different pipeline to drive the same config.  We would like to 
have control over how our HW pipeline is constructed.

> Some questions I have for AMD engineers that also I'd want to see
> addressed before any consideration of merging would happen!
>
> How do you plan on dealing with people rewriting or removing code
> upstream that is redundant in the kernel, but required for internal
> stuff?

Honestly I don't know what these are.  Like you and Jerome remove func 
ptr abstraction (I know it was bad, that was one of the component we 
ported from windows) and we need to keep it as function pointer so we 
can still run our code on FPGA before we see first silicon?  I don't 
think if we nak the function ptr removal will be a problem for 
community.  The rest is valued and we took with open arm.

Or this is more like we have code duplication after DRM added some 
functionality we can use?  I would imaging its more of moving what we 
got working in our code to DRM core if we are upstreamed and we have no 
problem accomodate for that as the code moved out to DRM core can be 
included in other platforms.  We don't have any private ioctl today and 
we don't plan to have any outside of using DRM object properties.

> How are you going to deal with new Linux things that overlap
> incompatibly with your internally developed stuff?
I really don't know what those new linux things can be that could cause 
us problem.  If anything the new things will be probably come from us if 
we are upstreammed.

atomic: we had that on windows 8 years ago for windows vista, yes 
sematic/abstraction is different but concept is the same. We could have 
easily settled with DRM semantics or DRM could easily take some form of 
our pattern.

DP MST:  AMD was the first source certified and we work closely with the 
first branch certified. I was a part of that team and we had a very 
solid implementation.  If we were upstreamed I don't see you would want 
to reinvent the wheel and not try to massage what we have into shape for 
DRM core for other driver to reuse.

drm_plane: windows multi-plane overlay and Andriod HW composer? We had 
that working 2 years ago.  If you are upstreammed and you are first you 
usually have a say in how it should go down don't you?

The new thing coming are Free Sync HDR, 8k at 60 with DP DSC etc.  I would 
imaging we would beat all other vendor to the first open source solution 
if we leverage effort from our extended display team.

> If the code is upstream will it be tested in the kernel by some QA
> group, or will there be some CI infrastructure used to maintain and to
> watch for Linux code that breaks assumptions in the DC code?
We have tester that runs set of display test every other day on linux.  
We don't run on DRM_Next tree yet and Alex is working out a plan to 
allow us use DRM_Next as our development branch.  Upstream is not likely 
to be tested by QA though.

DC does not assume anything.  DC require full state given in 
dc_commit_targets / dc_commit_surfaces_to_target.  we do whatever is 
specified in the data structure.  dc_commit_surfaces_to_target can be 
considered as a helper function to change plane without visual side 
effect on vsync bondary.  dc_commit_targets can be considered as a 
helper function to light up a display with black screen.  DRM core has 
full control if you want to light up to black screen as soon as monitor 
is plugged in or you want to light up after someone does a mode set.  
Hotplug interrupt goes to amdgpu_dm, and it will take the require lock 
in DRM object because calling DC to detect.
> Can you show me you understand that upstream code is no longer 100% in
> your control and things can happen to it that you might not expect and
> you need to deal with it?
I think so, other than we haven't been spanning the mailing list. We 
already dealing with we don't control 100% our code to some extend.  We 
don't control bandwidth_calc.  Trust me we are not keeping up with the 
updates that HW is doing with it for next gen hw.  Everytime we pull 
there is a new term they added and we have to find a way to feed that 
input.  We had to clean up linux style for them everytime we pull.  Our 
HW diagnostic suite has different set of requirements and they 
frequently contribute to our code.  We took you and Jerome's patch.  If 
it's validated we want that code.

At end of the day I think the architecture is really about what's HW and 
what's DRM core.  Like I said all the yellow boxes has been proposed to 
running on firmware but we decide to keep them in driver as it's easier 
to debug on x86 than uC.  I can tell you that our HW guys were happy 
when I decide to open source bandwidth_calc but we did it anyways.  I 
feel like because we are opening up the complexity and inner working of 
our HW, we are somehow getting penalized for being open.
>
> Dave.
Tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161212/be5df61c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DC.JPG
Type: image/jpeg
Size: 147686 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161212/be5df61c/attachment-0002.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAL3.JPG
Type: image/jpeg
Size: 117556 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161212/be5df61c/attachment-0003.jpe>