Hi Daniel, just a quick clarification in-line about "validation" inside atomic_commit. On 2016-12-08 04:59 AM, Daniel Vetter wrote: > Hi Harry, > > On Wed, Dec 07, 2016 at 09:02:13PM -0500, Harry Wentland wrote: >> We propose to use the Display Core (DC) driver for display support on >> AMD's upcoming GPU (referred to by uGPU in the rest of the doc). In order to >> avoid a flag day the plan is to only support uGPU initially and transition >> to older ASICs gradually. >> >> The DC component has received extensive testing within AMD for DCE8, 10, and >> 11 GPUs and is being prepared for uGPU. Support should be better than >> amdgpu's current display support. >> >> * All of our QA effort is focused on DC >> * All of our CQE effort is focused on DC >> * All of our OEM preloads and custom engagements use DC >> * DC behavior mirrors what we do for other OSes >> >> The new asic utilizes a completely re-designed atom interface, so we cannot >> easily leverage much of the existing atom-based code. >> >> We've introduced DC to the community earlier in 2016 and received a fair >> amount of feedback. Some of what we've addressed so far are: >> >> * Self-contain ASIC specific code. We did a bunch of work to pull >> common sequences into dc/dce and leave ASIC specific code in >> separate folders. >> * Started to expose AUX and I2C through generic kernel/drm >> functionality and are mostly using that. Some of that code is still >> needlessly convoluted. This cleanup is in progress. >> * Integrated Dave and Jeromeâ??s work on removing abstraction in bios >> parser. >> * Retire adapter service and asic capability >> * Remove some abstraction in GPIO >> >> Since a lot of our code is shared with pre- and post-silicon validation >> suites changes need to be done gradually to prevent breakages due to a major >> flag day. This, coupled with adding support for new asics and lots of new >> feature introductions means progress has not been as quick as we would have >> liked. We have made a lot of progress none the less. >> >> The remaining concerns that were brought up during the last review that we >> are working on addressing: >> >> * Continue to cleanup and reduce the abstractions in DC where it >> makes sense. >> * Removing duplicate code in I2C and AUX as we transition to using the >> DRM core interfaces. We can't fully transition until we've helped >> fill in the gaps in the drm core that we need for certain features. >> * Making sure Atomic API support is correct. Some of the semantics of >> the Atomic API were not particularly clear when we started this, >> however, that is improving a lot as the core drm documentation >> improves. Getting this code upstream and in the hands of more >> atomic users will further help us identify and rectify any gaps we >> have. >> >> Unfortunately we cannot expose code for uGPU yet. However refactor / cleanup >> work on DC is public. We're currently transitioning to a public patch >> review. You can follow our progress on the amd-gfx mailing list. We value >> community feedback on our work. >> >> As an appendix I've included a brief overview of the how the code currently >> works to make understanding and reviewing the code easier. >> >> Prior discussions on DC: >> >> * https://lists.freedesktop.org/archives/dri-devel/2016-March/103398.html >> * >> https://lists.freedesktop.org/archives/dri-devel/2016-February/100524.html >> >> Current version of DC: >> >> * https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7 >> >> Once Alex pulls in the latest patches: >> >> * https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7 >> >> Best Regards, >> Harry >> >> >> ************************************************ >> *** Appendix: A Day in the Life of a Modeset *** >> ************************************************ >> >> Below is a high-level overview of a modeset with dc. Some of this might be a >> little out-of-date since it's based on my XDC presentation but it should be >> more-or-less the same. >> >> amdgpu_dm_atomic_commit() >> { >> /* setup atomic state */ >> drm_atomic_helper_prepare_planes(dev, state); >> drm_atomic_helper_swap_state(dev, state); >> drm_atomic_helper_update_legacy_modeset_state(dev, state); >> >> /* create or remove targets */ >> >> /******************************************************************** >> * *** Call into DC to commit targets with list of all known targets >> ********************************************************************/ >> /* DC is optimized not to do anything if 'targets' didn't change. */ >> dc_commit_targets(dm->dc, commit_targets, commit_targets_count) >> { >> /****************************************************************** >> * *** Build context (function also used for validation) >> ******************************************************************/ >> result = core_dc->res_pool->funcs->validate_with_context( >> core_dc,set,target_count,context); > > I can't dig into details of DC, so this is not a 100% assessment, but if > you call a function called "validate" in atomic_commit, you're very, very > likely breaking atomic. _All_ validation must happen in ->atomic_check, > if that's not the case TEST_ONLY mode is broken. And atomic userspace is > relying on that working. > This function is not really named correctly. What it does is it builds a context and validates at the same time. In commit we simply care that it builds the context. Validate should never fail here (since this was already validated in atomic_check). We call the same function at atomic_check amdgpu_dm_atomic_check -> dc_validate_resources -> core_dc->res_pool->funcs->validate_with_context As for the rest, I hear you and appreciate your feedback. Let me get back to you on that later. Thanks, Harry > The only thing that you're allowed to return from ->atomic_commit is > out-of-memory, hw-on-fire and similar unforeseen and catastrophic issues. > Kerneldoc expklains this. > > Now the reason I bring this up (and we've discussed it at length in > private) is that DC still suffers from a massive abstraction midlayer. A > lot of the back-end stuff (dp aux, i2c, abstractions for allocation, > timers, irq, ...) have been cleaned up, but the midlayer is still there. > And I understand why you have it, and why it's there - without some OS > abstraction your grand plan of a unified driver across everything doesn't > work out so well. > > But in a way the backend stuff isn't such a big deal. It's annoying since > lots of code, and bugfixes have to be duplicated and all that, but it's > fairly easy to fix case-by-case, and as long as AMD folks stick around > (which I fully expect) not a maintainance issue. It makes it harder for > others to contribute, but then since it's mostly the leaf it's generally > easy to just improve the part you want to change (as an outsider). And if > you want to improve shared code the only downside is that you can't also > improve amd, but that's not so much a problem for non-amd folks ;-) > > The problem otoh with the abstraction layer between drm core and the amd > driver is that you can't ignore if you want to refactor shared code. And > because it's an entire world of its own, it's much harder to understand > what the driver is doing (without reading it all). Some examples of what I > mean: > > - All other drm drivers subclass drm objects (by embedding them) into the > corresponding hw part that most closely matches the drm object's > semantics. That means even when you have 0 clue about how a given piece > of hw works, you have a reasonable chance of understanding code. If it's > all your own stuff you always have to keep in minde the special amd > naming conventions. That gets old real fast if you trying to figure out > what 20+ (or are we at 30 already?) drivers are doing. > > - This is even more true for atomic. Atomic has a pretty complicated > check/commmit transactional model for updating display state. It's a > standardized interface, and it's extensible, and we want generic > userspace to be able to run on any driver. Fairly often we realize that > semantics of existing or newly proposed properties and state isn't > well-defined enough, and then we need to go&read all the drivers and > figure out how to fix up the mess. DC has it's entirely separate state > structures which again don't subclass the atomic core structures (afaik > at least). Again the same problems apply that you can't find things, and > that figuring out the exact semantics and spotting differences in > behaviour is almost impossible. > > - The trouble isn't just in reading code and understanding it correctly, > it's also in finding it. If you have your own completely different world > then just finding the right code is hard - cscope and grep fail to work. > > - Another issue is that very often we unify semantics in drivers by adding > some new helpers that at least dtrt for most of the drivers. If you have > your own world then the impendance mismatch will make sure that amd > drivers will have slightly different semantics, and I think that's not > good for the ecosystem and kms - people want to run a lot more than just > a boot splash with generic kms userspace, stuff like xf86-video-$vendor > is going out of favour heavily. > > Note that all this isn't about amd walking away and leaving an > unmaintainable mess behind. Like I've said I don't think this is a big > risk. The trouble is that having your own world makes it harder for > everyone else to understand the amd driver, and understanding all drivers > is very often step 1 in some big refactoring or feature addition effort. > Because starting to refactor without understanding the problem generally > doesn't work ;_) And you can't make this step 1 easier for others by > promising to always maintain DC and update it to all the core changes, > because that's only step 2. > > In all the DC discussions we've had thus far I haven't seen anyone address > this issue. And this isn't just an issue in drm, it's pretty much > established across all linux subsystems with the "no midlayer or OS > abstraction layers in drivers" rule. There's some real solid reasons why > such a HAl is extremely unpopular with upstream. And I haven't yet seen > any good reason why amd needs to be different, thus far it looks like a > textbook case, and there's been lots of vendors in lots of subsystems who > tried to push their HAL. > > Thanks, Daniel > >> >> /****************************************************************** >> * *** Apply safe power state >> ******************************************************************/ >> pplib_apply_safe_state(core_dc); >> >> /**************************************************************** >> * *** Apply the context to HW (program HW) >> ****************************************************************/ >> result = core_dc->hwss.apply_ctx_to_hw(core_dc,context) >> { >> /* reset pipes that need reprogramming */ >> /* disable pipe power gating */ >> /* set safe watermarks */ >> >> /* for all pipes with an attached stream */ >> /************************************************************ >> * *** Programming all per-pipe contexts >> ************************************************************/ >> status = apply_single_controller_ctx_to_hw(...) >> { >> pipe_ctx->tg->funcs->set_blank(...); >> pipe_ctx->clock_source->funcs->program_pix_clk(...); >> pipe_ctx->tg->funcs->program_timing(...); >> pipe_ctx->mi->funcs->allocate_mem_input(...); >> pipe_ctx->tg->funcs->enable_crtc(...); >> bios_parser_crtc_source_select(...); >> >> pipe_ctx->opp->funcs->opp_set_dyn_expansion(...); >> pipe_ctx->opp->funcs->opp_program_fmt(...); >> >> stream->sink->link->link_enc->funcs->setup(...); >> pipe_ctx->stream_enc->funcs->dp_set_stream_attribute(...); >> pipe_ctx->tg->funcs->set_blank_color(...); >> >> core_link_enable_stream(pipe_ctx); >> unblank_stream(pipe_ctx, >> >> program_scaler(dc, pipe_ctx); >> } >> /* program audio for all pipes */ >> /* update watermarks */ >> } >> >> program_timing_sync(core_dc, context); >> /* for all targets */ >> target_enable_memory_requests(...); >> >> /* Update ASIC power states */ >> pplib_apply_display_requirements(...); >> >> /* update surface or page flip */ >> } >> } >> >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >