[RFC] Using DC in amdgpu for upcoming GPU

harry.wentland@xxxxxxx (Harry Wentland) · Thu, 8 Dec 2016 09:33:25 -0500

Hi Daniel,

just a quick clarification in-line about "validation" inside atomic_commit.

On 2016-12-08 04:59 AM, Daniel Vetter wrote:
> Hi Harry,
>
> On Wed, Dec 07, 2016 at 09:02:13PM -0500, Harry Wentland wrote:
>> We propose to use the Display Core (DC) driver for display support on
>> AMD's upcoming GPU (referred to by uGPU in the rest of the doc). In order to
>> avoid a flag day the plan is to only support uGPU initially and transition
>> to older ASICs gradually.
>>
>> The DC component has received extensive testing within AMD for DCE8, 10, and
>> 11 GPUs and is being prepared for uGPU. Support should be better than
>> amdgpu's current display support.
>>
>>  * All of our QA effort is focused on DC
>>  * All of our CQE effort is focused on DC
>>  * All of our OEM preloads and custom engagements use DC
>>  * DC behavior mirrors what we do for other OSes
>>
>> The new asic utilizes a completely re-designed atom interface, so we cannot
>> easily leverage much of the existing atom-based code.
>>
>> We've introduced DC to the community earlier in 2016 and received a fair
>> amount of feedback. Some of what we've addressed so far are:
>>
>>  * Self-contain ASIC specific code. We did a bunch of work to pull
>>    common sequences into dc/dce and leave ASIC specific code in
>>    separate folders.
>>  * Started to expose AUX and I2C through generic kernel/drm
>>    functionality and are mostly using that. Some of that code is still
>>    needlessly convoluted. This cleanup is in progress.
>>  * Integrated Dave and Jeromeâ??s work on removing abstraction in bios
>>    parser.
>>  * Retire adapter service and asic capability
>>  * Remove some abstraction in GPIO
>>
>> Since a lot of our code is shared with pre- and post-silicon validation
>> suites changes need to be done gradually to prevent breakages due to a major
>> flag day.  This, coupled with adding support for new asics and lots of new
>> feature introductions means progress has not been as quick as we would have
>> liked. We have made a lot of progress none the less.
>>
>> The remaining concerns that were brought up during the last review that we
>> are working on addressing:
>>
>>  * Continue to cleanup and reduce the abstractions in DC where it
>>    makes sense.
>>  * Removing duplicate code in I2C and AUX as we transition to using the
>>    DRM core interfaces.  We can't fully transition until we've helped
>>    fill in the gaps in the drm core that we need for certain features.
>>  * Making sure Atomic API support is correct.  Some of the semantics of
>>    the Atomic API were not particularly clear when we started this,
>>    however, that is improving a lot as the core drm documentation
>>    improves.  Getting this code upstream and in the hands of more
>>    atomic users will further help us identify and rectify any gaps we
>>    have.
>>
>> Unfortunately we cannot expose code for uGPU yet. However refactor / cleanup
>> work on DC is public.  We're currently transitioning to a public patch
>> review. You can follow our progress on the amd-gfx mailing list. We value
>> community feedback on our work.
>>
>> As an appendix I've included a brief overview of the how the code currently
>> works to make understanding and reviewing the code easier.
>>
>> Prior discussions on DC:
>>
>>  * https://lists.freedesktop.org/archives/dri-devel/2016-March/103398.html
>>  *
>> https://lists.freedesktop.org/archives/dri-devel/2016-February/100524.html
>>
>> Current version of DC:
>>
>>  * https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7
>>
>> Once Alex pulls in the latest patches:
>>
>>  * https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7
>>
>> Best Regards,
>> Harry
>>
>>
>> ************************************************
>> *** Appendix: A Day in the Life of a Modeset ***
>> ************************************************
>>
>> Below is a high-level overview of a modeset with dc. Some of this might be a
>> little out-of-date since it's based on my XDC presentation but it should be
>> more-or-less the same.
>>
>> amdgpu_dm_atomic_commit()
>> {
>>   /* setup atomic state */
>>   drm_atomic_helper_prepare_planes(dev, state);
>>   drm_atomic_helper_swap_state(dev, state);
>>   drm_atomic_helper_update_legacy_modeset_state(dev, state);
>>
>>   /* create or remove targets */
>>
>>   /********************************************************************
>>    * *** Call into DC to commit targets with list of all known targets
>>    ********************************************************************/
>>   /* DC is optimized not to do anything if 'targets' didn't change. */
>>   dc_commit_targets(dm->dc, commit_targets, commit_targets_count)
>>   {
>>     /******************************************************************
>>      * *** Build context (function also used for validation)
>>      ******************************************************************/
>>     result = core_dc->res_pool->funcs->validate_with_context(
>>                                core_dc,set,target_count,context);
>
> I can't dig into details of DC, so this is not a 100% assessment, but if
> you call a function called "validate" in atomic_commit, you're very, very
> likely breaking atomic. _All_ validation must happen in ->atomic_check,
> if that's not the case TEST_ONLY mode is broken. And atomic userspace is
> relying on that working.
>

This function is not really named correctly. What it does is it builds a 
context and validates at the same time. In commit we simply care that it 
builds the context. Validate should never fail here (since this was 
already validated in atomic_check).

We call the same function at atomic_check

amdgpu_dm_atomic_check ->
	dc_validate_resources ->
		core_dc->res_pool->funcs->validate_with_context

As for the rest, I hear you and appreciate your feedback. Let me get 
back to you on that later.

Thanks,
Harry

> The only thing that you're allowed to return from ->atomic_commit is
> out-of-memory, hw-on-fire and similar unforeseen and catastrophic issues.
> Kerneldoc expklains this.
>
> Now the reason I bring this up (and we've discussed it at length in
> private) is that DC still suffers from a massive abstraction midlayer. A
> lot of the back-end stuff (dp aux, i2c, abstractions for allocation,
> timers, irq, ...) have been cleaned up, but the midlayer is still there.
> And I understand why you have it, and why it's there - without some OS
> abstraction your grand plan of a unified driver across everything doesn't
> work out so well.
>
> But in a way the backend stuff isn't such a big deal. It's annoying since
> lots of code, and bugfixes have to be duplicated and all that, but it's
> fairly easy to fix case-by-case, and as long as AMD folks stick around
> (which I fully expect) not a maintainance issue. It makes it harder for
> others to contribute, but then since it's mostly the leaf it's generally
> easy to just improve the part you want to change (as an outsider). And if
> you want to improve shared code the only downside is that you can't also
> improve amd, but that's not so much a problem for non-amd folks ;-)
>
> The problem otoh with the abstraction layer between drm core and the amd
> driver is that you can't ignore if you want to refactor shared code. And
> because it's an entire world of its own, it's much harder to understand
> what the driver is doing (without reading it all). Some examples of what I
> mean:
>
> - All other drm drivers subclass drm objects (by embedding them) into the
>   corresponding hw part that most closely matches the drm object's
>   semantics. That means even when you have 0 clue about how a given piece
>   of hw works, you have a reasonable chance of understanding code. If it's
>   all your own stuff you always have to keep in minde the special amd
>   naming conventions. That gets old real fast if you trying to figure out
>   what 20+ (or are we at 30 already?) drivers are doing.
>
> - This is even more true for atomic. Atomic has a pretty complicated
>   check/commmit transactional model for updating display state. It's a
>   standardized interface, and it's extensible, and we want generic
>   userspace to be able to run on any driver. Fairly often we realize that
>   semantics of existing or newly proposed properties and state isn't
>   well-defined enough, and then we need to go&read all the drivers and
>   figure out how to fix up the mess. DC has it's entirely separate state
>   structures which again don't subclass the atomic core structures (afaik
>   at least). Again the same problems apply that you can't find things, and
>   that figuring out the exact semantics and spotting differences in
>   behaviour is almost impossible.
>
> - The trouble isn't just in reading code and understanding it correctly,
>   it's also in finding it. If you have your own completely different world
>   then just finding the right code is hard - cscope and grep fail to work.
>
> - Another issue is that very often we unify semantics in drivers by adding
>   some new helpers that at least dtrt for most of the drivers. If you have
>   your own world then the impendance mismatch will make sure that amd
>   drivers will have slightly different semantics, and I think that's not
>   good for the ecosystem and kms - people want to run a lot more than just
>   a boot splash with generic kms userspace, stuff like xf86-video-$vendor
>   is going out of favour heavily.
>
> Note that all this isn't about amd walking away and leaving an
> unmaintainable mess behind. Like I've said I don't think this is a big
> risk. The trouble is that having your own world makes it harder for
> everyone else to understand the amd driver, and understanding all drivers
> is very often step 1 in some big refactoring or feature addition effort.
> Because starting to refactor without understanding the problem generally
> doesn't work ;_) And you can't make this step 1 easier for others by
> promising to always maintain DC and update it to all the core changes,
> because that's only step 2.
>
> In all the DC discussions we've had thus far I haven't seen anyone address
> this issue. And this isn't just an issue in drm, it's pretty much
> established across all linux subsystems with the "no midlayer or OS
> abstraction layers in drivers" rule. There's some real solid reasons why
> such a HAl is extremely unpopular with upstream. And I haven't yet seen
> any good reason why amd needs to be different, thus far it looks like a
> textbook case, and there's been lots of vendors in lots of subsystems who
> tried to push their HAL.
>
> Thanks, Daniel
>
>>
>>     /******************************************************************
>>      * *** Apply safe power state
>>      ******************************************************************/
>>     pplib_apply_safe_state(core_dc);
>>
>>     /****************************************************************
>>      * *** Apply the context to HW (program HW)
>>      ****************************************************************/
>>     result = core_dc->hwss.apply_ctx_to_hw(core_dc,context)
>>     {
>>       /* reset pipes that need reprogramming */
>>       /* disable pipe power gating */
>>       /* set safe watermarks */
>>
>>       /* for all pipes with an attached stream */
>>         /************************************************************
>>          * *** Programming all per-pipe contexts
>>          ************************************************************/
>>         status = apply_single_controller_ctx_to_hw(...)
>>         {
>>           pipe_ctx->tg->funcs->set_blank(...);
>>           pipe_ctx->clock_source->funcs->program_pix_clk(...);
>>           pipe_ctx->tg->funcs->program_timing(...);
>>           pipe_ctx->mi->funcs->allocate_mem_input(...);
>>           pipe_ctx->tg->funcs->enable_crtc(...);
>>           bios_parser_crtc_source_select(...);
>>
>>           pipe_ctx->opp->funcs->opp_set_dyn_expansion(...);
>>           pipe_ctx->opp->funcs->opp_program_fmt(...);
>>
>>           stream->sink->link->link_enc->funcs->setup(...);
>>           pipe_ctx->stream_enc->funcs->dp_set_stream_attribute(...);
>>           pipe_ctx->tg->funcs->set_blank_color(...);
>>
>>           core_link_enable_stream(pipe_ctx);
>>           unblank_stream(pipe_ctx,
>>
>>           program_scaler(dc, pipe_ctx);
>>         }
>>       /* program audio for all pipes */
>>       /* update watermarks */
>>     }
>>
>>     program_timing_sync(core_dc, context);
>>     /* for all targets */
>>       target_enable_memory_requests(...);
>>
>>     /* Update ASIC power states */
>>     pplib_apply_display_requirements(...);
>>
>>     /* update surface or page flip */
>>   }
>> }
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>