On Tue, Mar 02, 2021 at 03:44:44PM +0300, Dmitry Osipenko wrote: > Display controller (DC) performs isochronous memory transfers, and thus, > has a requirement for a minimum memory bandwidth that shall be fulfilled, > otherwise framebuffer data can't be fetched fast enough and this results > in a DC's data-FIFO underflow that follows by a visual corruption. > > The Memory Controller drivers provide facility for memory bandwidth > management via interconnect API. Let's wire up the interconnect API > support to the DC driver in order to fix the distorted display output > on T30 Ouya, T124 TK1 and other Tegra devices. I did a read on the code. I have put some thoughts and nits inbetween the diff, but here are more general questions about the patch: Is there a reason why the bandwidth is allocated per plane instead of just using one peak and average for the whole configuration? Or is there a need to expose all the planes as interconnect channels and allocate their bandwidth individually? >From algorithmic part I see that the plane overlaps are calculated twice (A vs B and later B vs A). The cursor plane is ignored, but nevertheless its overlap mask is calculated before being thrown away. The bandwidths are also calculated twice: once for pre-commit and then again for post-commit. Is setting bandwidth for an interconnect expensive when re-setting a value that was already set? The code seems to avoid this case, but maybe unnecessarily? [...cut the big and interesting part...] [...] > @@ -65,7 +75,9 @@ struct tegra_dc_soc_info { > unsigned int num_overlay_formats; > const u64 *modifiers; > bool has_win_a_without_filters; > + bool has_win_b_vfilter_mem_client; > bool has_win_c_without_vert_filter; > + unsigned int plane_tiled_memory_bandwidth_x2; This looks more like bool in the code using it. [...] > --- a/drivers/gpu/drm/tegra/plane.c > +++ b/drivers/gpu/drm/tegra/plane.c [...] > +static int tegra_plane_check_memory_bandwidth(struct drm_plane_state *state) The function's body looks more like 'update' or 'recalculate' rather than 'check' the memory bandwidth. > + struct tegra_plane_state *tegra_state = to_tegra_plane_state(state); > + unsigned int i, bpp, bpp_plane, dst_w, src_w, src_h, mul; > + const struct tegra_dc_soc_info *soc; > + const struct drm_format_info *fmt; > + struct drm_crtc_state *crtc_state; > + u32 avg_bandwidth, peak_bandwidth; > + > + if (!state->visible) > + return 0; > + > + crtc_state = drm_atomic_get_new_crtc_state(state->state, state->crtc); > + if (!crtc_state) > + return -EINVAL; > + > + src_w = drm_rect_width(&state->src) >> 16; > + src_h = drm_rect_height(&state->src) >> 16; > + dst_w = drm_rect_width(&state->dst); > + > + fmt = state->fb->format; > + soc = to_tegra_dc(state->crtc)->soc; > + > + /* > + * Note that real memory bandwidth vary depending on format and > + * memory layout, we are not taking that into account because small > + * estimation error isn't important since bandwidth is rounded up > + * anyway. > + */ > + for (i = 0, bpp = 0; i < fmt->num_planes; i++) { > + bpp_plane = fmt->cpp[i] * 8; Nit: you could declare the bpp_plane here. > + /* > + * Sub-sampling is relevant for chroma planes only and vertical > + * readouts are not cached, hence only horizontal sub-sampling > + * matters. > + */ > + if (i > 0) > + bpp_plane /= fmt->hsub; > + > + bpp += bpp_plane; > + } > + > + /* > + * Horizontal downscale takes extra bandwidth which roughly depends > + * on the scaled width. > + */ > + if (src_w > dst_w) > + mul = (src_w - dst_w) * bpp / 2048 + 1; > + else > + mul = 1; Does it really need more bandwidth to scale down? Does it read the same data multiple times just to throw it away? > + /* average bandwidth in bytes/s */ > + avg_bandwidth = src_w * src_h * bpp / 8 * mul; > + avg_bandwidth *= drm_mode_vrefresh(&crtc_state->mode); > + > + /* mode.clock in kHz, peak bandwidth in kbit/s */ > + peak_bandwidth = crtc_state->mode.clock * bpp * mul; I would guess that (src_w * bpp / 8) needs rounding up unless the plane is continuous. Or you could just add the max rounding error here and get a safe overestimated value. Maybe this would be better done when summing per-plane widths. > + /* ICC bandwidth in kbyte/s */ > + peak_bandwidth = kbps_to_icc(peak_bandwidth); > + avg_bandwidth = Bps_to_icc(avg_bandwidth); This could be merged with the assignments after the following 'if' block. > + /* > + * Tegra30/114 Memory Controller can't interleave DC memory requests > + * and DC uses 16-bytes atom for the tiled windows, while DDR3 uses 32 > + * bytes atom. Hence there is x2 memory overfetch for tiled framebuffer > + * and DDR3 on older SoCs. > + */ > + if (soc->plane_tiled_memory_bandwidth_x2 && > + tegra_state->tiling.mode == TEGRA_BO_TILING_MODE_TILED) { > + peak_bandwidth *= 2; > + avg_bandwidth *= 2; > + } > + > + tegra_state->peak_memory_bandwidth = peak_bandwidth; > + tegra_state->avg_memory_bandwidth = avg_bandwidth; > + > + return 0; > +} [...] > +static const char * const tegra_plane_icc_names[] = { > + "wina", "winb", "winc", "", "", "", "cursor", > +}; > + > +int tegra_plane_interconnect_init(struct tegra_plane *plane) > +{ > + const char *icc_name = tegra_plane_icc_names[plane->index]; Is plane->index guaranteed to be <= 6? I would guess so, but maybe BUILD_BUG_ON(sizeof(icc_names)==TEGRA_DC_LEGACY_PLANES_NUM) or some other check could document this? [...] Best Regards Michał Mirosław