On Mon, Dec 12, 2016 at 11:10:30PM -0500, Cheng, Tony wrote: > Thanks for the write up for the guide. We can definitely re-do atomic > according to guideline provided as I am not satified with how our code look > today. To me it seems more like we need to shuffle stuff around and rename > a few things than rewrite much of anything. > > I hope to get an answer on the reply to Dave's question regarding to if > there is anything else. If we can keep most of the stuff under /dc as the > "back end" helper and do most of the change under /amdgpu_dm then it isn't > that difficult as we don't need to go deal with the fall out on other > platforms. Again it's not just windows. We are fully aware that it's hard > to find the common abstraction between all different OS so we try our best > to have DC behave more like a helper than abstraction layer anyways. In our > design states and policies are domain of Display Managers (DM) and because > of linux we also say anything DRM can do that's also domain of DM. We don't > put anything in DC that we don't feel comfortable if HW decide to hide it in > FW. > > > On 12/12/2016 9:33 PM, Harry Wentland wrote: > > On 2016-12-11 03:28 PM, Daniel Vetter wrote: > > > On Wed, Dec 07, 2016 at 09:02:13PM -0500, Harry Wentland wrote: > > > > We propose to use the Display Core (DC) driver for display support on > > > > AMD's upcoming GPU (referred to by uGPU in the rest of the doc). > > > > In order to > > > > avoid a flag day the plan is to only support uGPU initially and > > > > transition > > > > to older ASICs gradually. > > > > > > Bridgeman brought it up a few times that this here was the question > > > - it's > > > kinda missing a question mark, hard to figure this out ;-). I'd say for > > > > My bad for the missing question mark (imprecise phrasing). On the other > > hand letting this blow over a bit helped get us on the map a bit more > > and allows us to argue the challenges (and benefits) of open source. :) > > > > > upstream it doesn't really matter, but imo having both atomic and > > > non-atomic paths in one driver is one world of hurt and I strongly > > > recommend against it, at least if feasible. All drivers that switched > > > switched in one go, the only exception was i915 (it took much longer > > > than > > > we ever feared, causing lots of pain) and nouveau (which only converted > > > nv50+, but pre/post-nv50 have always been two almost completely separate > > > worlds anyway). > > > > > > trust me we would like to upstream everything. Just we didn't invest enough > in DC code in the previous generation so the quality might not be there. > > > You mention the two probably most complex DRM drivers didn't switch in a > > single go... I imagine amdgpu/DC falls into the same category. > > > > I think one of the problems is making a sudden change with a fully > > validated driver without breaking existing use cases and customers. We > > really should've started DC development in public and probably would do > > that if we had to start anew. > > > > > > The DC component has received extensive testing within AMD for > > > > DCE8, 10, and > > > > 11 GPUs and is being prepared for uGPU. Support should be better than > > > > amdgpu's current display support. > > > > > > > > * All of our QA effort is focused on DC > > > > * All of our CQE effort is focused on DC > > > > * All of our OEM preloads and custom engagements use DC > > > > * DC behavior mirrors what we do for other OSes > > > > > > > > The new asic utilizes a completely re-designed atom interface, > > > > so we cannot > > > > easily leverage much of the existing atom-based code. > > > > > > > > We've introduced DC to the community earlier in 2016 and > > > > received a fair > > > > amount of feedback. Some of what we've addressed so far are: > > > > > > > > * Self-contain ASIC specific code. We did a bunch of work to pull > > > > common sequences into dc/dce and leave ASIC specific code in > > > > separate folders. > > > > * Started to expose AUX and I2C through generic kernel/drm > > > > functionality and are mostly using that. Some of that code is still > > > > needlessly convoluted. This cleanup is in progress. > > > > * Integrated Dave and Jerome’s work on removing abstraction in bios > > > > parser. > > > > * Retire adapter service and asic capability > > > > * Remove some abstraction in GPIO > > > > > > > > Since a lot of our code is shared with pre- and post-silicon validation > > > > suites changes need to be done gradually to prevent breakages > > > > due to a major > > > > flag day. This, coupled with adding support for new asics and > > > > lots of new > > > > feature introductions means progress has not been as quick as we > > > > would have > > > > liked. We have made a lot of progress none the less. > > > > > > > > The remaining concerns that were brought up during the last > > > > review that we > > > > are working on addressing: > > > > > > > > * Continue to cleanup and reduce the abstractions in DC where it > > > > makes sense. > > > > * Removing duplicate code in I2C and AUX as we transition to using the > > > > DRM core interfaces. We can't fully transition until we've helped > > > > fill in the gaps in the drm core that we need for certain features. > > > > * Making sure Atomic API support is correct. Some of the semantics of > > > > the Atomic API were not particularly clear when we started this, > > > > however, that is improving a lot as the core drm documentation > > > > improves. Getting this code upstream and in the hands of more > > > > atomic users will further help us identify and rectify any gaps we > > > > have. > > > > > > Ok so I guess Dave is typing some more general comments about > > > demidlayering, let me type some guidelines about atomic. Hopefully this > > > all materializes itself a bit better into improved upstream docs, > > > but meh. > > > > > > > Excellent writeup. Let us know when/if you want our review for upstream > > docs. > > > > We'll have to really take some time to go over our atomic > > implementation. A couple small comments below with regard to DC. > > > > > Step 0: Prep > > > > > > So atomic is transactional, but it's not validate + rollback or commit, > > > but duplicate state, validate and then either throw away or commit. > > > There's a few big reasons for this: a) partial atomic updates - if you > > > duplicate it's much easier to check that you have all the right locks b) > > > kfree() is much easier to check for correctness than a rollback code and > > > c) atomic_check functions are much easier to audit for invalid > > > changes to > > > persistent state. > > > > > > > There isn't really any rollback. I believe even in our other drivers > > we've abandoned the rollback approach years ago because it doesn't > > really work on modern HW. Any rollback cases you might find in DC should > > really only be for catastrophic errors (read: something went horribly > > wrong... read: congratulations, you just found a bug). > > > There is no rollback. We moved to "atomic" for Windows Vista in the > previous DAL 8 years ago. Windows only care about VidPnSource (frame > buffer) and VidPnTarget (display output) and leave the rest up to driver but > we had to behave atomic as Window obsolutely "check" every possible config > with the famous EnumConfunctionalModality DDI. > > > > Trouble is that this seems a bit unusual compared to all other > > > approaches, > > > and ime (from the drawn-out i915 conversion) you really don't want > > > to mix > > > things up. Ofc for private state you can roll back (e.g. vc4 does > > > that for > > > the drm_mm allocator thing for scanout slots or whatever it is), but > > > it's > > > trivial easy to accidentally check the wrong state or mix them up or > > > something else bad. > > > > > > Long story short, I think step 0 for DC is to split state from objects, > > > i.e. for each dc_surface/foo/bar you need a > > > dc_surface/foo/bar_state. And > > > all the back-end functions need to take both the object and the state > > > explicitly. > > > > > > This is a bit a pain to do, but should be pretty much just > > > mechanical. And > > > imo not all of it needs to happen before DC lands in upstream, but see > > > above imo that half-converted state is postively horrible. This should > > > also not harm cross-os reuse at all, you can still store things together > > > on os where that makes sense. > > > > > > Guidelines for amdgpu atomic structures > > > > > > drm atomic stores everything in state structs on plane/connector/crtc. > > > This includes any property extensions or anything else really, the > > > entire > > > userspace abi is built on top of this. Non-trivial drivers are > > > supposed to > > > subclass these to store their own stuff, so e.g. > > > > > > amdgpu_plane_state { > > > struct drm_plane_state base; > > > > > > /* amdgpu glue state and stuff that's linux-specific, e.g. > > > * property values and similar things. Note that there's strong > > > * push towards standardizing properties and stroing them in the > > > * drm_*_state structs. */ > > > > > > struct dc_surface_state surface_state; > > > > > > /* other dc states that fit to a plane */ > > > }; > > > > Is there any requirement where the header and code that deal with > dc_surface_state has to be? Can we keep it under /dc while > amdgpu_plane_state exist under /amdgpu_dm? None. And my proposal here with having dc_*_state structures for the dc block, and fairly separate amdgpu_*_state blocks to bind it into drm is exaclty to facilitate this split, so dc_* stuff would still entirely live in dc/ (and hopefully shared with everyone else), while amdgpu would be the linux glue. 1 > > > Yes not everything will fit 1:1 in one of these, but to get started I > > > strongly recommend to make them fit (maybe with reduced feature sets to > > > start out). Stuff that is shared between e.g. planes, but always on the > > > same crtc can be put into amdgpu_crtc_state, e.g. if you have > > > scalers that > > > are assignable to a plane. > > > > > > Of course atomic also supports truly global resources, for that you need > > > to subclass drm_atomic_state. Currently msm and i915 do that, and > > > probably > > > best to read those structures as examples until I've typed the docs. > > > But I > > > expect that especially for planes a few dc_*_state structs will stay in > > > amdgpu_*_state. > > > > We need to treat most of resource that don't map well as global. One example > is pixel pll. We have 6 display pipes but only 2 or 3 plls in CI/VI, as a > result we are limited in number of HDMI or DVI we can drive at the same > time. Also the pixel pll can be used to drive DP as well, so there is > another layer of HW specific but we can't really contain it in crtc or > encoder by itself. Doing this resource allocation require knowlege of the > whole system, and knowning which pixel pll is already used, and what can we > support with remaining pll. Same on i915. Other stuff we currently treat as global are the overall clocks&bandwidth/latency needs, plus all the fetch fifo settings and latencies (because they depend in complicated ways on everything else). But e.g. the fetch latency and bw needed for each plane is computed in the plane check code. Scalers otoh are per-crtc on intel. > Another ask is lets say we are driving 2 displays, we would always want > instance 0 and instance 1 of scaler, timing generator etc getting used. We > want to avoid possiblity of due to different user mode commit sequence we > end up with driving the 2 display with 0 and 2nd instance of HW. Not only > this configuration isn't really validated in the lab, we will be less > effective in power gating as instance 0 and 1 are one the same tile. > instead of having 2/3 of processing pipeline silicon power gated we can only > power gate 1/3. And if we power gate wrong the you will have 1 of the 2 > display not lighting up. Just implement some bias in which shared resoures your prefer for which crtc. Also note that with atomic you can always add more drm objeccts to the commit. So if you've put yourself into a corner (for power optimizatino reasons), but then userspace wants to light up more displays, and you want to reassing the resources for the already enabled outputs (e.g. when not all clocks are the same and only some support really high clocks). We do that on intel when we need to change the display core clock, since that means recomputing everything (and you can't change the display core clock while a display is on). > Having HW resource used the same way on all platform under any sequence / > circumstance is important for us, as power optimization/measure is done for > given platform + display config mostly on only 1 OS by the HW team. Yeah, should all be possible with atomic. And I think with some work it should be possible to keep that selection logic for shared resources in the shared code, even with this redesign. > > > Guidelines for atomic_check > > > > > > Please use the helpers as much as makes sense, and put at least the > > > basic > > > steps that from drm_*_state into the respective dc_*_state functional > > > block into the helper callbacks for that object. I think basic > > > validation > > > of individal bits (as much as possible, e.g. if you just don't support > > > e.g. scaling or rotation with certain pixel formats) should happen in > > > there too. That way when we e.g. want to check how drivers corrently > > > validate a given set of properties to be able to more strictly > > > define the > > > semantics, that code is easy to find. > > > > > > Also I expect that this won't result in code duplication with other OS, > > > you need code to map from drm to dc anyway, might as well > > > check&reject the > > > stuff that dc can't even represent right there. > > > > > > The other reason is that the helpers are good guidelines for some of the > > > semantics, e.g. it's mandatory that drm_crtc_needs_modeset gives the > > > right > > > answer after atomic_check. If it doesn't, then you're driver doesn't > > > follow atomic. If you completely roll your own this becomes much > > > harder to > > > assure. > > > > it doesn't today and we have equilvant check in dc in our hw_seq. We will > look into how to make it work. Our "atomic" operate on always knowing the > current state (core_dc.current_ctx) and finding out the delta between the > desired future state computed in our dc_validate. One thing we were > stuggling with it seems DRM is building up incremental state, ie. if > something isn't mentioned in atomic_commit then you don't touch it. We > operate in a mode where if something isn't mentioned in dc_commit_target we > disable those output. this method allow us to always know current and > future state, as future state is built up by caller (amdgpu), and we are > able to transition into the future state on vsync boundary if required. It > seems to me that drm_*_state require us to compartimentize states. It won't > be as trivial to fill the input for bandwidth_calc as that beast need > everything as everything end up goes through the same memory controller. > our validate_context is specifically design to make it easy to generate > input parameter for bandwidth_calc. Per pipe validate like pixel format, > scaling is not a problem. Hm, that needs to be fixed. Atom also gives you both old and new state, but only for state objects which are changed. You can just go around and add everything (see example above), but you should _only_ do that when necessary. One design goal of atomic is that when you do a modeset on an 2nd display then page-flip should continue to work (assuming the hw can do it) on the 1st display without any stalls. We have fine-grained locking for this, but if you always need all the state then you defeat that point. Of course this is a bit tricky if you have lots of complicated shared state. The way we solve this is by pushing copies of relevant data from planes/crtc down to the shared resources. This way you end up with a read-only copy, and as long as those derived values don't change the independtly running pageflip loop won't need to stall for your modeset. And the modeset code can still look at the data, without grabbing the full update lock. This is probably going to be a bit of a rework, so for starters would make sense to only aim to have parallel flips (without any modesets). That still means you need to be careful with grabbing global states. > > Interesting point. Not sure if we've checked that. Is there some sort of > > automated test for this that we can use to check? > > > > > Of course extend it all however you want, e.g. by adding all the global > > > optimization and resource assignment stuff after initial per-object > > > checking has been done using the helper infrastructure. > > > > > > Guidelines for atomic_commit > > > > > > Use the new nonblcoking helpers. Everyone who didn't got it wrong. Also, > > > > I believe we're not using those and didn't start with those which might > > explain (along with lack of discussion on dri-devel) why atomic > > currently looks the way it does in DC. This is definitely one of the > > bigger issues we'd want to clean up and where you wouldn't find much > > pushback, other than us trying to find time to do it. > > > > > your atomic_commit should pretty much match the helper one, except for a > > > custom swap_state to handle all your globally shared specia dc_*_state > > > objects. Everything hw specific should be in atomic_commit_tail. > > > > > > Wrt the hw commit itself, for the modeset step just roll your own. > > > That's > > > the entire point of atomic, and atm both i915 and nouveau exploit this > > > fully. Besides a bit of glue there shouldn't be much need for > > > linux-specific code here - what you need is something to fish the right > > > dc_*_state objects and give it your main sequencer functions. What you > > > should make sure though is that only ever do a modeset when that was > > > signalled, i.e. please use drm_crtc_needs_modeset to control that part. > > > Feel free to wrap up in a dc_*_needs_modeset for better abstraction if > > > that's needed. > > > > Using state properly will solve our double resource assignment/validation > problem during commit. Thanks for the guidance on how to do this. > > now the question is can we have a helper function to house the main sequence > and put it in /dc? Yeah, that's my proposal. You probably need some helper functions and iterator macros on the overall dc_state (or amdgpu_state, whatever you call it) so that your helper function can walk all the state objects correctly. And only those which are part of the state (see above for why this is important), but sharing that overall commit logic should be possible. > > > I do strongly suggest however that you implement the plane commit using > > > the helpers. There's really only a few ways to implement this in the hw, > > > and it should work everywhere. > > > > Maybe from SW perspective I'll look at the intel code to understand this. > In terms of HW I would have to say I disagree with that. The multi-plane > blend stuff even in our HW we have gone through 1 minor revision and 1 major > change. Also the same HW is build to handle stereo 3D, multi-plane > blending, pipe spliting and more. The pipeline / blending stuff tend to > change in HW because HW need to constently design to meet the timing > requirement of ever increasing pixel rate to keep us competitive. When HW > can't meet timing they employ the split trick and have 2 copy of the same HW > to be able to push through that many number of pixel. If we are Intel and > on latest process node then we probably won't have this problem. I beat our > 2018 HW will change again especially things are moving toward 64bpp FP16 > pixel format by default for HDR. None of this matters for atomic multi-plane commit. There's about 3 ways to do that: - GO bit that you set to signal to the hw the new state that it should commit on the next vblank. - vblank inhibit bit (works like GO inverted, but doesn't auto-clear). - vblank evasion in software. I haven't seen anything else in 20+ atomic drivers. None of this has anything to do with the features your planes and blending engine support. And the above is somewhat interesting to know because it matters for how you send out the completion event for the atomic commit correctly, which again is part of the uabi contract. Hence why I think it makes sense to consider these strongly, even though they're fairly deeply nested - they are again a part of the glue that binds DC into the linux world. Cheers, Daniel > > > Misc guidelines > > > > > > Use the suspend/resume helpers. If your atomic can't do that, it's not > > > terribly good. Also, if DC can't make those fit, it's probably still too > > > much midlayer and its own world than helper library. > > > > > > > Do they handle swapping DP displays while the system is asleep? If not > > we'll probably need to add that. The other case where we have some > > special handling has to do with headless (sleep or resume, don't > > remember). > > > > > Use all the legacy helpers, again your atomic should be able to pull it > > > off. One exception is async plane flips (both primary and cursors), > > > that's > > > atm still unsolved. Probably best to keep the old code around for just > > > that case (but redirect to the compat helpers for everything), see e.g. > > > how vc4 implements cursors. > > > > > > > Good old flip. There probably isn't much shareable code between OSes > > here. It seems like every OS rolls there own thing, regarding flips. We > > still seem to be revisiting flips regularly, especially with FreeSync > > (adaptive sync) in the mix now. Good to know that this is still a bit of > > an open topic. > > > > > Most imporant of all > > > > > > Ask questions on #dri-devel. amdgpu atomic is the only nontrivial atomic > > > driver for which I don't remember a single discussion about some detail, > > > at least not with any of the DAL folks. Michel&Alex asked some questions > > > sometimes, but that indirection is bonghits and the defeats the point of > > > upstream: Direct cross-vendor collaboration to get shit done. Please > > > make > > > it happen. > > > > > > > Please keep asking us to get on dri-devel with questions. I need to get > > into the habit again of leaving the IRC channel open. I think most of us > > are still a bit scared of it or don't know how to deal with some of the > > information overload (IRC and mailing list). It's some of my job to > > change that all the while I'm learning this myself. :) > > > > Thanks for all your effort trying to get people involved. > > > > > Oh and I pretty much assume Harry&Tony are volunteered to review atomic > > > docs ;-) > > > > > > > Sure. > > > > Cheers, > > Harry > > > > > Cheers, Daniel > > > > > > > > > > > > > > > > > Unfortunately we cannot expose code for uGPU yet. However > > > > refactor / cleanup > > > > work on DC is public. We're currently transitioning to a public patch > > > > review. You can follow our progress on the amd-gfx mailing list. > > > > We value > > > > community feedback on our work. > > > > > > > > As an appendix I've included a brief overview of the how the > > > > code currently > > > > works to make understanding and reviewing the code easier. > > > > > > > > Prior discussions on DC: > > > > > > > > * > > > > https://lists.freedesktop.org/archives/dri-devel/2016-March/103398.html > > > > * > > > > https://lists.freedesktop.org/archives/dri-devel/2016-February/100524.html > > > > > > > > > > > > Current version of DC: > > > > > > > > * https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7 > > > > > > > > Once Alex pulls in the latest patches: > > > > > > > > * https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7 > > > > > > > > Best Regards, > > > > Harry > > > > > > > > > > > > ************************************************ > > > > *** Appendix: A Day in the Life of a Modeset *** > > > > ************************************************ > > > > > > > > Below is a high-level overview of a modeset with dc. Some of > > > > this might be a > > > > little out-of-date since it's based on my XDC presentation but > > > > it should be > > > > more-or-less the same. > > > > > > > > amdgpu_dm_atomic_commit() > > > > { > > > > /* setup atomic state */ > > > > drm_atomic_helper_prepare_planes(dev, state); > > > > drm_atomic_helper_swap_state(dev, state); > > > > drm_atomic_helper_update_legacy_modeset_state(dev, state); > > > > > > > > /* create or remove targets */ > > > > > > > > /******************************************************************** > > > > * *** Call into DC to commit targets with list of all known targets > > > > ********************************************************************/ > > > > /* DC is optimized not to do anything if 'targets' didn't change. */ > > > > dc_commit_targets(dm->dc, commit_targets, commit_targets_count) > > > > { > > > > /****************************************************************** > > > > * *** Build context (function also used for validation) > > > > ******************************************************************/ > > > > result = core_dc->res_pool->funcs->validate_with_context( > > > > core_dc,set,target_count,context); > > > > > > > > /****************************************************************** > > > > * *** Apply safe power state > > > > ******************************************************************/ > > > > pplib_apply_safe_state(core_dc); > > > > > > > > /**************************************************************** > > > > * *** Apply the context to HW (program HW) > > > > ****************************************************************/ > > > > result = core_dc->hwss.apply_ctx_to_hw(core_dc,context) > > > > { > > > > /* reset pipes that need reprogramming */ > > > > /* disable pipe power gating */ > > > > /* set safe watermarks */ > > > > > > > > /* for all pipes with an attached stream */ > > > > /************************************************************ > > > > * *** Programming all per-pipe contexts > > > > ************************************************************/ > > > > status = apply_single_controller_ctx_to_hw(...) > > > > { > > > > pipe_ctx->tg->funcs->set_blank(...); > > > > pipe_ctx->clock_source->funcs->program_pix_clk(...); > > > > pipe_ctx->tg->funcs->program_timing(...); > > > > pipe_ctx->mi->funcs->allocate_mem_input(...); > > > > pipe_ctx->tg->funcs->enable_crtc(...); > > > > bios_parser_crtc_source_select(...); > > > > > > > > pipe_ctx->opp->funcs->opp_set_dyn_expansion(...); > > > > pipe_ctx->opp->funcs->opp_program_fmt(...); > > > > > > > > stream->sink->link->link_enc->funcs->setup(...); > > > > pipe_ctx->stream_enc->funcs->dp_set_stream_attribute(...); > > > > pipe_ctx->tg->funcs->set_blank_color(...); > > > > > > > > core_link_enable_stream(pipe_ctx); > > > > unblank_stream(pipe_ctx, > > > > > > > > program_scaler(dc, pipe_ctx); > > > > } > > > > /* program audio for all pipes */ > > > > /* update watermarks */ > > > > } > > > > > > > > program_timing_sync(core_dc, context); > > > > /* for all targets */ > > > > target_enable_memory_requests(...); > > > > > > > > /* Update ASIC power states */ > > > > pplib_apply_display_requirements(...); > > > > > > > > /* update surface or page flip */ > > > > } > > > > } > > > > > > > > > > > > _______________________________________________ > > > > dri-devel mailing list > > > > dri-devel@xxxxxxxxxxxxxxxxxxxxx > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel