On Thu, Apr 07, 2022 at 03:09:48PM +0300, Govindapillai, Vinod wrote: > On Thu, 2022-04-07 at 09:43 +0300, Lisovskiy, Stanislav wrote: > > On Wed, Apr 06, 2022 at 09:09:06PM +0300, Ville Syrjälä wrote: > > > On Wed, Apr 06, 2022 at 08:14:58PM +0300, Lisovskiy, Stanislav wrote: > > > > On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote: > > > > > On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote: > > > > > > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote: > > > > > > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote: > > > > > > > > In configurations with single DRAM channel, for usecases like > > > > > > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks > > > > > > > > like the wm0 watermark values need to bumped up because the wm0 > > > > > > > > memory latency calculations are probably not taking the DRAM > > > > > > > > channel's impact into account. > > > > > > > > > > > > > > > > As per the Bspec 49325, if the ddb allocation can hold at least > > > > > > > > one plane_blocks_per_line we should have selected method2. > > > > > > > > Assuming that modern HW versions have enough dbuf to hold > > > > > > > > at least one line, set the wm blocks to equivalent to blocks > > > > > > > > per line. > > > > > > > > > > > > > > > > cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > > > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@xxxxxxxxx> > > > > > > > > > > > > > > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@xxxxxxxxx> > > > > > > > > --- > > > > > > > > drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++- > > > > > > > > 1 file changed, 18 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > > > > > > > > index 8824f269e5f5..ae28a8c63ca4 100644 > > > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c > > > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > > > > > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state > > > > > > > > *crtc_state, > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > - blocks = fixed16_to_u32_round_up(selected_result) + 1; > > > > > > > > + /* > > > > > > > > + * Lets have blocks at minimum equivalent to plane_blocks_per_line > > > > > > > > + * as there will be at minimum one line for lines configuration. > > > > > > > > + * > > > > > > > > + * As per the Bspec 49325, if the ddb allocation can hold at least > > > > > > > > + * one plane_blocks_per_line, we should have selected method2 in > > > > > > > > + * the above logic. Assuming that modern versions have enough dbuf > > > > > > > > + * and method2 guarantees blocks equivalent to at least 1 line, > > > > > > > > + * select the blocks as plane_blocks_per_line. > > > > > > > > + * > > > > > > > > + * TODO: Revisit the logic when we have better understanding on DRAM > > > > > > > > + * channels' impact on the level 0 memory latency and the relevant > > > > > > > > + * wm calculations. > > > > > > > > + */ > > > > > > > > + blocks = skl_wm_has_lines(dev_priv, level) ? > > > > > > > > + max_t(u32, fixed16_to_u32_round_up(selected_result) + 1, > > > > > > > > + fixed16_to_u32_round_up(wp->plane_blocks_per_line)) : > > > > > > > > + fixed16_to_u32_round_up(selected_result) + 1; > > > > > > > > > > > > > > That's looks rather convoluted. > > > > > > > > > > > > > > blocks = fixed16_to_u32_round_up(selected_result) + 1; > > > > > > > + /* blah */ > > > > > > > + if (has_lines) > > > > > > > + blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line)); > > > > > > > > > > > > We probably need to do similar refactoring in the whole function ;-) > > > > > > > > > > > > > Also since Art said nothing like this should actually be needed > > > > > > > I think the comment should make it a bit more clear that this > > > > > > > is just a hack to work around the underruns with some single > > > > > > > memory channel configurations. > > > > > > > > > > > > It is actually not quite a hack, because we are missing that condition > > > > > > implementation from BSpec 49325, which instructs us to select method2 > > > > > > when ddb blocks allocation is known and that ratio is >= 1. > > > > > > > > > > The ddb allocation is not yet known, so we're implementing the > > > > > algorithm 100% correctly. > > > > > > > > > > And this patch does not implement that misisng part anyway. > > > > > > > > Yes, as I understood method2 would just give amount of blocks to be > > > > at least as dbuf blocks per line. > > > > > > > > Wonder whether should we actually fully implement this BSpec clause > > > > and add it to the point where ddb allocation is known or are there > > > > any obstacles to do that, besides having to reshuffle this function a bit? > > > > > > We need to calculate the wm to figure out how much ddb to allocate, > > > and then we'd need the ddb allocation to figure out how to calculate > > > the wm. Very much chicken vs. egg right there. We'd have to do some > > > kind of hideous loop where we'd calculate everything twice. I don't > > > really want to do that since I'd actually like to move the wm > > > calculation to happen already much earlier during .check_plane() > > > as that could reduce the amount of redundant wm calculations we > > > are currently doing. > > > > I might be missing some details right now, but why do we need a ddb > > allocation to count wms? > > > > I thought its like we usually calculate wm levels + min_ddb_allocation, > > then based on that we do allocate min_ddb + extra for each plane. > > This is correct that by this moment when we calculate wms we have only > > min_ddb available, so if this level would be even enabled, we would > > at least need min_ddb blocks. > > > > I think we could just use that min_ddb value here for that purpose, > > because the condition anyway checks if > > (plane buffer allocation / plane blocks per line) >=1 so, even if > > if this wm level would be enabled plane buffer allocation would > > be at least min_ddb _or higher_ - however that won't affect this > > condition because even if it happens to be "plane buffer allocation > > + some extra" the ratio would still be valid. > > So if it executes for min_ddb / plane blocks per line, we can > > probably safely state, further it will be also true. > > min_ddb = 110% of the blocks calculated from the 2 methods (blocks + 10%) > It depends on what method we choose. So I dont think we can use it for any assumptions. Min_ddb is what matters for us because it is an actual ddb allocation we use, but not the wm level. As I understand (plane buffer allocation / plane blocks per line) >=1 validity depends only if min_ddb can get lower after we do full allocation in skl_allocate_plane_ddb, which can't be smaller than min_ddb. The allocation algorithm works in such way that it tries to allocate at least min_ddb , if it can't - wm level would be disabled. However if it succeeds it might try to add some extra blocks to the allocation (see skl_allocate_plane_ddb). So yes, even though we don't know the exact allocation in skl_compute_plane_wm - we can for sure assume it won't be less than min_ddb anyway, which means that if min_ddb / plane_blocks_per_line >= 1 is true, it will be true also in further, if that wm level would be at all enabled. Stan > > But in any case, I think this patch do not cause any harm in most of the usecases expected out of > skl+ platforms which have enough dbuf! > > Per plane ddb allocation happens based on the highest wm level min_ddb which can fit into the > allocation. If one level is not fit, then that level + above package C state transitions are > disabled. > Now if you look at the logic to select which method to use - if the latency >= linetime, we select > the large buffer method which guantees that there is atleast plane_blocks_per_line. So I think we > can safely assume that latency for wake wm level will be mostly higher, which implies using the > "large buffer" method. > > So this change mostly limits to wm0. And hence should not impact ddb allocation, but the memory > fetch bursts might happen slightly more frequently when the processor is in C0? > > BR > vinod > > > > > Stan > > > > > -- > > > Ville Syrjälä > > > Intel