Re: [RFC] Using DC in amdgpu for upcoming GPU

"Cheng, Tony" <tony.cheng@xxxxxxx> · Tue, 13 Dec 2016 04:14:20 -0500

On 12/13/2016 2:30 AM, Dave Airlie wrote:
(hit send too early)
We would love to upstream DC for all supported asic!  We made enough change
to make Sea Island work but it's really not validate to the extend we
validate Polaris on linux and no where close to what we do for 2017 ASICs.
With DC the display hardware programming, resource optimization, power
management and interaction with rest of system will be fully validated
across multiple OSs.  Therefore we have high confidence that the quality is
going to better than what we have upstreammed today.

I don't have a baseline to say if DC is in good enough quality for older
generation compare to upstream.  For example we don't have HW generate
bandwidth_calc for DCE 8/10 (Sea/Vocanic island family) but our code is
structured in a way that we assume bandwidth_calc is there.  None of us feel
like go untangle the formulas in windows driver at this point to create our
own version of bandwidth_calc.  It sort of work with HW default values but
some mode / config is likely to underflows.  If community is okay with
uncertain quality, sure we would love to upstream everything to reduce our
maintaince overhead.  You do get audio with DC on DCE8 though.
If we get any of this upstream, we should get all of the hw supported with it.

If it regresses we just need someone to debug why.
great will do.

Maybe let me share what we are doing and see if we can come up with
something to make DC work for both upstream and our internal need.  We are
sharing code not just on Linux and we will do our best to make our code
upstream friendly.  Last year we focussed on having enough code to prove
that our DAL rewrite works and get more people contributing to it.  We rush
a bit as a result we had a few legacy component we port from Windows driver
and we know it's bloat that needed to go.

We designed DC so HW can contribute bandwidth_calc magic and psuedo code to
program the HW blocks.  The HW blocks on the bottom of DC.JPG in models our
HW blocks and the programming sequence are provided by HW engineers.  If a
piece of HW need a bit toggled 7 times during power up I rather have HW
engineer put that in their psedo code rather than me trying to find that
sequence in some document.  Afterall they did simulate the HW with the
toggle sequence.  I guess these are back-end code Daniel talked about.  Can
we agree that DRM core is not interested in how things are done in that
layer and we can upstream these as it?

The next is dce_hwseq.c to program the HW blocks in correct sequence.  Some
HW block can be programmed in any sequence, but some requires strict
sequence to be followed.  For example Display CLK and PHY CLK need to be up
before we enable timing generator.  I would like these sequence to remain in
DC as it's really not DRM's business to know how to program the HW.  In a
way you can consider hwseq as a helper to commit state to HW.

Above hwseq is the dce*_resource.c.  It's job is to come up with the HW
state required to realize given config.  For example we would use the exact
same HW resources with same optimization setting to drive any same given
config.  If 4 x 4k@60 is supported with resource setting A on HW diagnositc
suite during bring up setting B on Linux then we have a problem.  It know
which HW block work with which block and their capability and limitations.
I hope you are not asking this stuff to move up to core because in reality
we should probably hide this in some FW, as HW expose the register to config
them differently that doesn't mean all combination of HW usage is validated.
To me resource is more of a helper to put together functional pipeline and
does not make any decision that any OS might be interested in.

These yellow boxes in DC.JPG are really specific to each generation of HW
and changes frequently.  These are things that HW has consider hiding it in
FW before.  Can we agree on those code (under /dc/dce*) can stay?
I think most of these things are fine to be part of the solution we end up at,
but I can't say for certain they won't require interface changes. I think the
most useful code is probably the stuff in the dce subdirectories.
okay as long as we can agree on this piece stay I am sure we can make it 
work.

Is this about demonstration how basic functionality work and add more
features with series of patches to make review eaiser?  If so I don't think
we are staff to do this kind of rewrite.  For example it make no sense to
hooking up bandwidth_calc to calculate HW magic if we don't have mem_input
to program the memory settings.  We need portion of hw_seq to ensure these
blocks are programming in correct sequence.  We will need to feed
bandwidth_calc it's required inputs, which is basically the whole system
state tracked in validate_context today, which means we basically need big
bulk of resource.c.  This effort might have benefit in reviewing the code,
but we will end up with pretty much similar if not the same as what we
already have.
This is something people always say, I'm betting you won't end up there at all,
it's not just review, it's incremental development model, so that when things
go wrong we can pinpoint why and where a lot easier. Just merging this all in
one fell swoop is going to just mean a lot of pain in the end. I understand you
aren't resourced for this sort of development on this codebase, but it's going
to be an impasse to try and merge this all at once even if was clean code.
how is it going to work then?  we can merge hardware programming code 
(the hw objects under /dc/dce) without anyone call it?

Or is the objection that we have the white boxes in DC.JPG instead of using
DRM objects?  We can probably workout something to have the white boxes
derive from DRM objects and extend atomic state with our validate_context
where dce*_resource.c stores the constructed pipelines.
I think Daniel explained quite well how things should look in terms of
subclassing.
okay we will look into how to do it.  this definitely won't happen over 
night as we need to get clear on what to do first and look at how other 
driver does it.  As per Harry's RFC (last still to do item), we plan to 
work on atomic anyways this expanded the scope of that a bit.

5) Why is a midlayer bad?
I'm not going to go into specifics on the DC midlayer, but we abhor
midlayers for a fair few reasons. The main reason I find causes the
most issues is locking. When you have breaks in code flow between
multiple layers, but having layers calling back into previous layers
it becomes near impossible to track who owns the locking and what the
current locking state is.

Consider
     drma -> dca -> dcb -> drmb
     drmc -> dcc  -> dcb -> drmb

We have two codes paths that go back into drmb, now maybe drma has a
lock taken, but drmc doesn't, but we've no indication when we hit drmb
of what the context pre entering the DC layer is. This causes all
kinds of problems. The main requirement is the driver maintains the
execution flow as much as possible. The only callback behaviour should
be from an irq or workqueue type situations where you've handed
execution flow to the hardware to do something and it is getting back
to you. The pattern we use to get our of this sort of hole is helper
libraries, we structure code as much as possible as leaf nodes that
don't call back into the parents if we can avoid it (we don't always
succeed).

Okay.  by the way DC does behave like a helper for most part.  There is no
locking in DC.  We work enough with different OS to know they all have
different synchronization primatives and interrupt handling and have DC lock
anything is just shooting ourself in the foot.  We do have function with
lock in their function name in DC but those are HW register lock to ensure
that the HW register update atomically. ie have 50 register write latch in
HW at next vsync to ensure HW state change on vsync boundary.

So the above might becomes
    drma-> dca_helper
            -> dcb_helper
            -> drmb.

In this case the code flow is controlled by drma, dca/dcb might be
modifying data or setting hw state but when we get to drmb it's easy
to see what data is needs and what locking.

DAL/DC goes against this in so many ways, and when I look at the code
I'm never sure where to even start pulling the thread to unravel it.

I don't know where we go against it.  In the case we do callback to DRM for
MST case we have

amdgpu_dm_atomic_commit (implement atomic_commit)
dc_commit_targets (commit helper)
dce110_apply_ctx_to_hw (hw_seq)
core_link_enable_stream (part of MST enable sequence)
allocate_mst_payload (helper for above func in same file)
dm_helpers_dp_mst_write_payload_allocation_table (glue code to call DRM)
drm_dp_mst_allocate_vcpi (DRM)

As you see even in this case we are only 6 level deep before we callback to
DRM, and 2 of those functions are in same file as helper func of the bigger
sequence.

Can you clarify the distinction between what you would call a mid layer vs
helper.  We consulted Alex a lot and we know about this inversion of control
pattern and we are trying our best to do it.  Is it the way functions are
named and files folder structure?  Would it help if we flatten
amdgpu_dm_atomic_commit and dc_commit_targets?  Even if we do I would
imagine we want some helper in commit rather a giant 1000 line function.  Is
there any concern that we put dc_commit_targets under /dc folder as we want
other platform to run exact same helper?  Or this is about the state
dc_commit_targets is too big?  or the state is stored validate_context
rather than drm_atomic_state?
Well one area I hit today while looking, is trace the path for a dpcd
read or write.

An internal one in the dc layer goes

core_link_dpcd_read (core_link)
dm_helpers_dp_read_dpcd(context, dc_link)
   search connector list for the appropriate connector
   drm_dp_dpcd_read

Note the connector list searching, this is a case of where you have called
back into the toplevel driver without the info necessary because core_link
and dc_link are too far abstracted from the drm connector.
(get_connector_for_link is a bad idea)

Then we get back around through the aux stuff and end up at:
dc_read_dpcd which passes connector->dc_link->link_index down
    this look up the dc_link again in core_dc->links[index]
dal_ddc_service_read_dpcd_data(link->ddc)
which calls into the i2caux path.

This is not helper functions or anything close, this is layering hell.
As per Harry's RFC (2nd still todo item), we are working on switching 
fully to use I2c / aux code provided by DRM.  we just haven't got there 
yet.  I would think this part would look similar to how MST part look 
once we cleaned it up.  By the way anything with dal_* in the name are 
stuff we ported to get us up an running quickly.  dal_* has no business 
in dc and will be removed.
I don't think it make sense for DRM to get into how we decide to use our HW
blocks.  For example any refactor done in core should not result in us using
different pipeline to drive the same config.  We would like to have control
over how our HW pipeline is constructed.
I don't think the DRM wants to get involved at that level, but it would be good
if we could collapse the mountains of functions and layers so that you can
clearly see how a modeset happens all the way down to the hw in a linear
fashion.
there is really not that many layers.  if you look at MST example we 
will hit registers in 5 level.

amdgpu_dm_atomic_commit
dc_commit_targets
dce110_apply_ctx_to_hw
core_link_enable_stream
allocate_mst_payload (same as above drm callback exmaple)
dce110_stream_encoder_set_mst_bandwidth
REG_SET

How do you plan on dealing with people rewriting or removing code
upstream that is redundant in the kernel, but required for internal
stuff?

Honestly I don't know what these are.  Like you and Jerome remove func ptr
abstraction (I know it was bad, that was one of the component we ported from
windows) and we need to keep it as function pointer so we can still run our
code on FPGA before we see first silicon?  I don't think if we nak the
function ptr removal will be a problem for community.  The rest is valued
and we took with open arm.

Or this is more like we have code duplication after DRM added some
functionality we can use?  I would imaging its more of moving what we got
working in our code to DRM core if we are upstreamed and we have no problem
accomodate for that as the code moved out to DRM core can be included in
other platforms.  We don't have any private ioctl today and we don't plan to
have any outside of using DRM object properties.
I've just sent some patches to remove a bunch of dpcd defines, that is just
one small example.
All of them are great for us to merge except patch 1/8. dc: remove dc 
hub.  As you might have guess that function is for ASIC currently in the 
lab.  Maybe we should have sanitized it with #ifdef and not have it 
visiable upstream in the first place.

I really don't know what those new linux things can be that could cause us
problem.  If anything the new things will be probably come from us if we are
upstreammed.
But until then there will be competing development upstream, and you might
want to merge things.
Maybe.  Somehow my gut feel is either we will either have those new 
things in demo-able shape before competing development start like 
FreeSync or it's something everybody care about and need SW ecosystem 
support like HDR.  In HDR case we are more than happen to participate up 
front.

DP MST:  AMD was the first source certified and we work closely with the
first branch certified. I was a part of that team and we had a very solid
implementation.  If we were upstreamed I don't see you would want to
reinvent the wheel and not try to massage what we have into shape for DRM
core for other driver to reuse.
Definitely, I hate writing MST code, and it would have been good if someone else
had gotten to it first.

So I think after looking more at it, my major issue is with DC, the
core stuff,
let me clarify you mean stuff under /dc/core?  I think there is a path 
for us to have those subclass DRM objects.
not the hw
touching stuff, but the layering stuff, dc and core infrastructure in
a lot of places
calls into the the DM layer and back into itself. It's a bit of an
tangle to pull any
one thread of it and try to unravel it.
I think we only have dpcd/i2c left, which we said we will fix.

There also seems to be a fair lot of headers of questionable value, I've found
the same set of defines (or pretty close ones) in a few headers,

redundant header will go.  just we need to spend time to go through 
them.  most of them are leftover from dal port.
conversion functions
between different layer definitions etc. There are redundant header
files, unused structs, structs
of questionable value or structs that should be merged.

Stuff is hidden between dc and core structs, but it isn't always obvious why
stuff is in dc_link vs core_link. Ideally we'd lose some of that layering.
okay, we will probably end up subclassing for DRM object anyways.

Also things like loggers and fixed function calculators, and vector
code probably need to be bumped
up a layer or two or made sure to be completely generic, and put
outside the DC code, if code is
in amd/display dir it should be display code.

stuff under dc/basic are quick ports to get us going and you probably 
already notice we don't use some of them.   By the way is there any 
problem using float to do our bandwidth_calc?  we can safe/restore 
context on x86.  bandwidth_calc will run a lot faster and we can make it 
somewhat readible and get rid of fixpt31_32.c.

I'm going to be happily ignoring most of this until early next year at
this point (I might jump in/out a few times)
but I think Daniel and Alex have a pretty good handle on where this
code should be going to get upstream, I think we should
all be listening to them as much as possible.

Dave.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel