On Tue, Feb 15, 2022 at 01:26:50PM +0200, Ville Syrjälä wrote: > On Tue, Feb 15, 2022 at 01:02:48PM +0200, Lisovskiy, Stanislav wrote: > > On Tue, Feb 15, 2022 at 12:10:19PM +0200, Ville Syrjälä wrote: > > > On Tue, Feb 15, 2022 at 10:59:57AM +0200, Lisovskiy, Stanislav wrote: > > > > On Mon, Feb 14, 2022 at 10:26:39PM +0200, Ville Syrjälä wrote: > > > > > On Mon, Feb 14, 2022 at 07:03:05PM +0200, Lisovskiy, Stanislav wrote: > > > > > > On Mon, Feb 14, 2022 at 12:24:57PM +0200, Ville Syrjälä wrote: > > > > > > > On Mon, Feb 14, 2022 at 12:05:36PM +0200, Lisovskiy, Stanislav wrote: > > > > > > > > On Mon, Feb 14, 2022 at 11:18:07AM +0200, Ville Syrjala wrote: > > > > > > > > > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > > > > > > > > > > > > > > > If the only thing that is changing is SAGV vs. no SAGV but > > > > > > > > > the number of active planes and the total data rates end up > > > > > > > > > unchanged we currently bail out of intel_bw_atomic_check() > > > > > > > > > early and forget to actually compute the new WGV point > > > > > > > > > mask and thus won't actually enable/disable SAGV as requested. > > > > > > > > > This ends up poorly if we end up running with SAGV enabled > > > > > > > > > when we shouldn't. Usually ends up in underruns. > > > > > > > > > To fix this let's go through the QGV point mask computation > > > > > > > > > if anyone else already added the bw state for us. > > > > > > > > > > > > > > > > Haven't been looking this in a while. Despite we have been > > > > > > > > looking like few revisions together still some bugs :( > > > > > > > > > > > > > > > > I thought SAGV vs No SAGV can't change if active planes > > > > > > > > or data rate didn't change? Because it means we probably > > > > > > > > still have same ddb allocations, which means SAGV state > > > > > > > > will just stay the same. > > > > > > > > > > > > > > SAGV can change due to watermarks/ddb allocations. The easiest > > > > > > > way to trip this up is to try to use the async flip wm0/ddb > > > > > > > optimization. That immediately forgets to turn off SAGV and > > > > > > > we get underruns, whcih is how I noticed this. And I don't > > > > > > > immediately see any easy proof that this couldn't also happen > > > > > > > due to some other plane changes. > > > > > > > > > > > > Thats the way it was initially implemented even before SAGV was added. > > > > > > > > > > Yeah, it wasn't a problem as long as SAGV was not enabled. > > > > > > > > > > > I think it can be dated back to the very first bw check was implemented. > > > > > > > > > > > > commit c457d9cf256e942138a54a2e80349ee7fe20c391 > > > > > > Author: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > > > Date: Fri May 24 18:36:14 2019 +0300 > > > > > > > > > > > > drm/i915: Make sure we have enough memory bandwidth on ICL > > > > > > > > > > > > +int intel_bw_atomic_check(struct intel_atomic_state *state) > > > > > > +{ > > > > > > + struct drm_i915_private *dev_priv = to_i915(state->base.dev); > > > > > > + struct intel_crtc_state *new_crtc_state, *old_crtc_state; > > > > > > + struct intel_bw_state *bw_state = NULL; > > > > > > + unsigned int data_rate, max_data_rate; > > > > > > + unsigned int num_active_planes; > > > > > > + struct intel_crtc *crtc; > > > > > > + int i; > > > > > > + > > > > > > + /* FIXME earlier gens need some checks too */ > > > > > > + if (INTEL_GEN(dev_priv) < 11) > > > > > > + return 0; > > > > > > + > > > > > > + for_each_oldnew_intel_crtc_in_state(state, crtc, old_crtc_state, > > > > > > + new_crtc_state, i) { > > > > > > + unsigned int old_data_rate = > > > > > > + intel_bw_crtc_data_rate(old_crtc_state); > > > > > > + unsigned int new_data_rate = > > > > > > + intel_bw_crtc_data_rate(new_crtc_state); > > > > > > + unsigned int old_active_planes = > > > > > > + intel_bw_crtc_num_active_planes(old_crtc_state); > > > > > > + unsigned int new_active_planes = > > > > > > + intel_bw_crtc_num_active_planes(new_crtc_state); > > > > > > + > > > > > > + /* > > > > > > + * Avoid locking the bw state when > > > > > > + * nothing significant has changed. > > > > > > + */ > > > > > > + if (old_data_rate == new_data_rate && > > > > > > + old_active_planes == new_active_planes) > > > > > > + continue; > > > > > > + > > > > > > + bw_state = intel_atomic_get_bw_state(state); > > > > > > + if (IS_ERR(bw_state)) > > > > > > + return PTR_ERR(bw_state); > > > > > > > > > > > > However, what can cause watermarks/ddb to change, besides plane state change > > > > > > and/or active planes change? We change watermarks, when we change ddb allocations > > > > > > and we change ddb allocations when active planes had changed and/or data rate > > > > > > had changed. > > > > > > > > > > The bw code only cares about the aggregate numbers from all the planes. > > > > > The planes could still change in some funny way where eg. some plane > > > > > frees up some bandwidth, but the other planes gobble up the exact same > > > > > amount and thus the aggregate numbers the bw atomic check cares about > > > > > do not change but the watermarks/ddb do. > > > > > > > > > > And as mentiioned, the async flip wm0/ddb optimization makes this trivial > > > > > to trip up since it will want to disable SAGV as there is not enough ddb > > > > > for the SAGV watermark. And async flip specifically isn't even allowed > > > > > to change anything that would affect the bandwidth utilization, and neither > > > > > is it allowed to enable/disable planes. > > > > > > > > I think the whole idea of setting ddb to minimum in case of async flip optimization > > > > was purely our idea - BSpec/HSD only mentions forbidding wm levels > 0 in case of async > > > > flip, however there is nothing about limiting ddb allocations. > > > > > > Reducing just the watermark doesn't really make sense > > > if the goal is to keep the DBUF level to a minimum. Also > > > I don't think there is any proper docs for this thing. The > > > only thing we have just has some vague notes about using > > > "minimum watermarks", whatever that means. > > > > Was it the goal? I thought limiting watermarks would by itself also > > limit package C states, thus affecting memory clocks and latency. > > Because it really doesn't say anything about keeping Dbuf allocations > > to a minimum. > > The goal is to miminize the amount of data in the FIFO. > > > > > > > > > > > > > > Was a bit suspicious about that whole change, to be honest - and yep, now it seems to > > > > cause some unexpected side effects. > > > > > > The bw_state vs. SAGV bug is there regardless of the wm0 optimization. > > > > I agree there is a bug. The bug is such that initial bw checks were relying > > on total data rate + active planes comparison, while it should have accounted > > data rate per plane usage. > > > > This should have been changed in SAGV patches, but probably had gone > > unnoticed both by you and me. > > > > > > > > Also the SAGV watermark is not the minimum watermark (if that is > > > the doc really means by that), the normal WM0 is the minimum watermark. > > > So even if we interpret the doc to say that we should just disable all > > > watermark levels except the smallest one (normal WM0) without changing > > > the ddb allocations we would still end up disabling SAGV. > > > > Thats actually a good question. Did they mean, disable all "regular" wm levels > > or the SAGV one also? Probably they meant what you say, but would be nice to know > > exactly. > > They said neither. It's just "program minimum watermarks" which > could mean anything really. They do explicitly say "DBUF level > can also adversely affect flip performance." which I think is > the whole point of this exercise. > > > > > Anyway my point here is that, we probably shouldn't use new_bw_state as a way to > > check that plane allocations had changed. Thats just confusing. > > We are not checking if plane allocations have changed. We are > trying to determine if anything in the bw_state has changed. > If we have said state already then something in it may have > changed and we have to recalculate anything that may depend > on those changed things, namely pipe_sagv_reject->qgv_point_mask. I think it is just not very intuitive that we use the fact whether we can get new_bw_state or not, as a way to check if something had changed. Would be nice to put it in somekind of a wrapper like "has_new_bw_state" or "bw_state_changed". Because for anyone not quite familiar with that state paradigm we use, that would look pretty confusing that first we get new_bw_state using intel_atomic_get_new_bw_state, then immediately override it with intel_atomic_get_bw_state. And whether we can get new_bw_state or not is just acting like a check, that we don't have anything changed in bw_state. Moreover indeed ideally intel_bw_atomic_check should probably handle all that sagv stuff as well, i.e I would suggest moving pipe_reject_mask setting, based skl_compute_wm results to that function. I don't see any issue here because in skl_compute_wm we just calculate the sagv wm, then in intel_bw_atomic_check we just call intel_compute_sagv_mask, which then calls tgl_crtc_can_enable_sagv for each crtc and sets this mask. I think by boing this in intel_bw_atomic_check we would achieve both, what you were willing to do, plus it would be more obvious, why things are happening that way. Stan > > I think ideally we'd not even modify the bw_state directly from the > watermark code and we'd instead defer that to bw atomic check entirely. > But this SAGV vs. DDB business is your typical chicken vs. egg situation, > so I'm not sure that is possible to do. Would need to spend a few minutes > thinking about it I guess. > > > > > May be for you as i915 guru, thats obvious however not for someone else, who might > > touch the code and we are doing open source here. > > > > Can we just add some check which explicitly does per plane data rate checks? > > There is nothing interesting about per-plane data rates. > > > So that we know bail out from that first cycle not only when total_data_rate/active planes > > had changed, but we check per plane data rate? > > That might actually save us also in future, if we ever get into such situation, when > > bw_state doesn't change, but ddb allocations do. > > > > I know you might say it shouldn't happen, but there is always some new stuff coming. > > > > Stan > > > > > > > > > Also we are now forcing the recalculation to be done always no matter what and using > > > > new bw state for that in a bit counterintuitive way, which I don't like. > > > > Not even sure that will always work, as we are not guaranteed to get a non-NULL > > > > new_bw_state object from calling intel_atomic_get_new_bw_state, for that purpose we > > > > typically call intel_atomic_get_bw_state, which is supposed to do that and its called only > > > > here and in cause of CDCLK recalculation, which is called in intel_cdclk_atomic_check and > > > > done right after this one. > > > > > > If there is no bw_state then bw_state->pipe_sagv_reject can't have > > > changed and there is nothing to recalculate. > > > > > > > > > > > So if we haven't called intel_atomic_get_bw_state beforehand, which we didn't because there are > > > > 2 places, where new bw state was supposed to be created to be usable by intel_atomic_get_new_bw_state > > > > - I think, we will(or might) get a NULL here, because intel_atomic_get_bw_state hasn't been called yet. > > > > > > Yes, NULL is perfectly fine. > > > > > > -- > > > Ville Syrjälä > > > Intel > > -- > Ville Syrjälä > Intel