15.03.2021 01:31, Michał Mirosław пишет: > On Thu, Mar 11, 2021 at 08:22:54PM +0300, Dmitry Osipenko wrote: >> Display controller (DC) performs isochronous memory transfers, and thus, >> has a requirement for a minimum memory bandwidth that shall be fulfilled, >> otherwise framebuffer data can't be fetched fast enough and this results >> in a DC's data-FIFO underflow that follows by a visual corruption. > [...] >> +static unsigned long >> +tegra_plane_overlap_mask(struct drm_crtc_state *state, >> + const struct drm_plane_state *plane_state) >> +{ >> + const struct drm_plane_state *other_state; >> + const struct tegra_plane *tegra; >> + unsigned long overlap_mask = 0; >> + struct drm_plane *plane; >> + struct drm_rect rect; >> + >> + if (!plane_state->visible || !plane_state->fb) >> + return 0; >> + >> + drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) { > [...] >> + /* >> + * Data-prefetch FIFO will easily help to overcome temporal memory >> + * pressure if other plane overlaps with the cursor plane. >> + */ >> + if (tegra_plane_is_cursor(plane_state) && overlap_mask) >> + return 0; >> + >> + return overlap_mask; >> +} > > Since for cursor plane this always returns 0, you could test > tegra_plane_is_cursor() at the start of the function. Yes, thanks. >> +static int tegra_crtc_calculate_memory_bandwidth(struct drm_crtc *crtc, >> + struct drm_atomic_state *state) > [...] >> + /* >> + * For overlapping planes pixel's data is fetched for each plane at >> + * the same time, hence bandwidths are accumulated in this case. >> + * This needs to be taken into account for calculating total bandwidth >> + * consumed by all planes. >> + * >> + * Here we get the overlapping state of each plane, which is a >> + * bitmask of plane indices telling with what planes there is an >> + * overlap. Note that bitmask[plane] includes BIT(plane) in order >> + * to make further code nicer and simpler. >> + */ >> + drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, new_state) { >> + tegra_state = to_const_tegra_plane_state(plane_state); >> + tegra = to_tegra_plane(plane); >> + >> + if (WARN_ON_ONCE(tegra->index >= TEGRA_DC_LEGACY_PLANES_NUM)) >> + return -EINVAL; >> + >> + plane_peak_bw[tegra->index] = tegra_state->peak_memory_bandwidth; >> + mask = tegra_plane_overlap_mask(new_state, plane_state); >> + overlap_mask[tegra->index] = mask; >> + >> + if (hweight_long(mask) != 3) >> + all_planes_overlap_simultaneously = false; >> + } >> + >> + old_state = drm_atomic_get_old_crtc_state(state, crtc); >> + old_dc_state = to_const_dc_state(old_state); >> + >> + /* >> + * Then we calculate maximum bandwidth of each plane state. >> + * The bandwidth includes the plane BW + BW of the "simultaneously" >> + * overlapping planes, where "simultaneously" means areas where DC >> + * fetches from the planes simultaneously during of scan-out process. >> + * >> + * For example, if plane A overlaps with planes B and C, but B and C >> + * don't overlap, then the peak bandwidth will be either in area where >> + * A-and-B or A-and-C planes overlap. >> + * >> + * The plane_peak_bw[] contains peak memory bandwidth values of >> + * each plane, this information is needed by interconnect provider >> + * in order to set up latency allowness based on the peak BW, see >> + * tegra_crtc_update_memory_bandwidth(). >> + */ >> + for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) { >> + overlap_bw = 0; >> + >> + for_each_set_bit(k, &overlap_mask[i], 3) { >> + if (k == i) >> + continue; >> + >> + if (all_planes_overlap_simultaneously) >> + overlap_bw += plane_peak_bw[k]; >> + else >> + overlap_bw = max(overlap_bw, plane_peak_bw[k]); >> + } >> + >> + new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw; >> + >> + /* >> + * If plane's peak bandwidth changed (for example plane isn't >> + * overlapped anymore) and plane isn't in the atomic state, >> + * then add plane to the state in order to have the bandwidth >> + * updated. >> + */ >> + if (old_dc_state->plane_peak_bw[i] != >> + new_dc_state->plane_peak_bw[i]) { >> + plane = tegra_crtc_get_plane_by_index(crtc, i); >> + if (!plane) >> + continue; >> + >> + plane_state = drm_atomic_get_plane_state(state, plane); >> + if (IS_ERR(plane_state)) >> + return PTR_ERR(plane_state); >> + } >> + } > [...] > > Does it matter to which channel (plane) the peak bw is attached? Would > it still work if the first channel specified max(peak_bw of overlaps) > and others only zeroes? The latency allowance will be configured incorrectly for the case of zeroes by the memory driver, hence it does matter. >> + /* >> + * Horizontal downscale needs a lower memory latency, which roughly >> + * depends on the scaled width. Trying to tune latency of a memory >> + * client alone will likely result in a strong negative impact on >> + * other memory clients, hence we will request a higher bandwidth >> + * since latency depends on bandwidth. This allows to prevent memory >> + * FIFO underflows for a large plane downscales, meanwhile allowing >> + * display to share bandwidth fairly with other memory clients. >> + */ >> + if (src_w > dst_w) >> + mul = (src_w - dst_w) * bpp / 2048 + 1; >> + else >> + mul = 1; > [...] > > One point is unexplained yet: why is the multiplier proportional to a > *difference* between src and dst widths? Also, I would expect max (worst > case) is pixclock * read_size when src_w/dst_w >= read_size. IIRC, the difference gives a more adequate/practical result than the proportion. Although, downstream driver uses proportion. I'll try to revisit this for the next version of the patch. > BTW, you could move this below and : if (src > dst_w) peak_bandwidth *= ... Indeed, and should be a bit nicer to use 'mul' in both cases.