On Thu, Mar 11, 2021 at 08:22:54PM +0300, Dmitry Osipenko wrote: > Display controller (DC) performs isochronous memory transfers, and thus, > has a requirement for a minimum memory bandwidth that shall be fulfilled, > otherwise framebuffer data can't be fetched fast enough and this results > in a DC's data-FIFO underflow that follows by a visual corruption. [...] > +static unsigned long > +tegra_plane_overlap_mask(struct drm_crtc_state *state, > + const struct drm_plane_state *plane_state) > +{ > + const struct drm_plane_state *other_state; > + const struct tegra_plane *tegra; > + unsigned long overlap_mask = 0; > + struct drm_plane *plane; > + struct drm_rect rect; > + > + if (!plane_state->visible || !plane_state->fb) > + return 0; > + > + drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) { [...] > + /* > + * Data-prefetch FIFO will easily help to overcome temporal memory > + * pressure if other plane overlaps with the cursor plane. > + */ > + if (tegra_plane_is_cursor(plane_state) && overlap_mask) > + return 0; > + > + return overlap_mask; > +} Since for cursor plane this always returns 0, you could test tegra_plane_is_cursor() at the start of the function. > +static int tegra_crtc_calculate_memory_bandwidth(struct drm_crtc *crtc, > + struct drm_atomic_state *state) [...] > + /* > + * For overlapping planes pixel's data is fetched for each plane at > + * the same time, hence bandwidths are accumulated in this case. > + * This needs to be taken into account for calculating total bandwidth > + * consumed by all planes. > + * > + * Here we get the overlapping state of each plane, which is a > + * bitmask of plane indices telling with what planes there is an > + * overlap. Note that bitmask[plane] includes BIT(plane) in order > + * to make further code nicer and simpler. > + */ > + drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, new_state) { > + tegra_state = to_const_tegra_plane_state(plane_state); > + tegra = to_tegra_plane(plane); > + > + if (WARN_ON_ONCE(tegra->index >= TEGRA_DC_LEGACY_PLANES_NUM)) > + return -EINVAL; > + > + plane_peak_bw[tegra->index] = tegra_state->peak_memory_bandwidth; > + mask = tegra_plane_overlap_mask(new_state, plane_state); > + overlap_mask[tegra->index] = mask; > + > + if (hweight_long(mask) != 3) > + all_planes_overlap_simultaneously = false; > + } > + > + old_state = drm_atomic_get_old_crtc_state(state, crtc); > + old_dc_state = to_const_dc_state(old_state); > + > + /* > + * Then we calculate maximum bandwidth of each plane state. > + * The bandwidth includes the plane BW + BW of the "simultaneously" > + * overlapping planes, where "simultaneously" means areas where DC > + * fetches from the planes simultaneously during of scan-out process. > + * > + * For example, if plane A overlaps with planes B and C, but B and C > + * don't overlap, then the peak bandwidth will be either in area where > + * A-and-B or A-and-C planes overlap. > + * > + * The plane_peak_bw[] contains peak memory bandwidth values of > + * each plane, this information is needed by interconnect provider > + * in order to set up latency allowness based on the peak BW, see > + * tegra_crtc_update_memory_bandwidth(). > + */ > + for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) { > + overlap_bw = 0; > + > + for_each_set_bit(k, &overlap_mask[i], 3) { > + if (k == i) > + continue; > + > + if (all_planes_overlap_simultaneously) > + overlap_bw += plane_peak_bw[k]; > + else > + overlap_bw = max(overlap_bw, plane_peak_bw[k]); > + } > + > + new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw; > + > + /* > + * If plane's peak bandwidth changed (for example plane isn't > + * overlapped anymore) and plane isn't in the atomic state, > + * then add plane to the state in order to have the bandwidth > + * updated. > + */ > + if (old_dc_state->plane_peak_bw[i] != > + new_dc_state->plane_peak_bw[i]) { > + plane = tegra_crtc_get_plane_by_index(crtc, i); > + if (!plane) > + continue; > + > + plane_state = drm_atomic_get_plane_state(state, plane); > + if (IS_ERR(plane_state)) > + return PTR_ERR(plane_state); > + } > + } [...] Does it matter to which channel (plane) the peak bw is attached? Would it still work if the first channel specified max(peak_bw of overlaps) and others only zeroes? > + /* > + * Horizontal downscale needs a lower memory latency, which roughly > + * depends on the scaled width. Trying to tune latency of a memory > + * client alone will likely result in a strong negative impact on > + * other memory clients, hence we will request a higher bandwidth > + * since latency depends on bandwidth. This allows to prevent memory > + * FIFO underflows for a large plane downscales, meanwhile allowing > + * display to share bandwidth fairly with other memory clients. > + */ > + if (src_w > dst_w) > + mul = (src_w - dst_w) * bpp / 2048 + 1; > + else > + mul = 1; [...] One point is unexplained yet: why is the multiplier proportional to a *difference* between src and dst widths? Also, I would expect max (worst case) is pixclock * read_size when src_w/dst_w >= read_size. BTW, you could move this below and : if (src > dst_w) peak_bandwidth *= ... > + /* average bandwidth in bytes/s */ > + avg_bandwidth = (bpp * src_w * src_h * mul + 7) / 8; > + avg_bandwidth *= drm_mode_vrefresh(&crtc_state->mode); > + > + /* mode.clock in kHz, peak bandwidth in kbit/s */ > + peak_bandwidth = crtc_state->mode.clock * bpp * mul; [...] Best Regards Michał Mirosław