2014-07-14 16:10 GMT-03:00 Todd Previte <tprevite@xxxxxxxxx>: > Link training for Displayport can fail in many ways and at multiple different points > during the training process. Previously, errors were logged but no additional action > was taken based on them. Consequently, training attempts could continue even after > errors have occured that would prevent successful link training. This patch updates > the link training functions and where/how they're used to be more intelligent about > failures and to stop trying to train the link when it's a lost cause. I agree we need to do something about this problem, but I'm not sure how this patch improves the situation. Can you please describe more how exactly the changes you did are getting us towards the solution of the problem? Of course, the points where you start signaling previously-unsignaled errors are obviously an improvement. Anyway, this patch should probably be split in 3: - A patch to add the boolean return values and change intel_dp_check_link_status() + intel_enable_dp() + intel_ddi_pre_enable(). - A patch to signal dpcd error cases we were previously ignoring. - A patch to that changes how intel_dp_start_link_train() and intel_dp_stop_link_train() currently behave (the "goto"s replacing "break" statements). Se below for better explanations. The big problem here is that these encoder callbacks never fail, so there's not really much to do after we detect a sink failure. In the current code (without your patch), we already clearly signal the link training failures with debug+error messages, so the new debug messages at places linke intel_enable_dp() are not much of an improvement. Also, we already run intel_dp_set_idle_link_train() at the end of intel_dp_complete_link_train(), and we do additional things such as calling intel_dp_stop_link_train(). And I guess we do the non-DDI equivalent steps at some point too, so I'm not sure how jumping straight to intel_dp_set_idle_link_train() helps, since we do it anyway as part of the normal sequence. Also, our mode set sequence is currently completely followed - even though the sink fails to understand what we throw at it - and I'm always afraid of not following the sequence exactly as described in the spec, since it could lead to unpredicted bugs (we had this problem dozens of times in the past). I think the real cool solution would be to retry link training with different parameters (different clock and number of lanes), but I imagine this would require a lot of code refactoring since we probably need to go back to the compute_config stages of the modeset sequence. Or maybe just finding a way to tell the user-space modesetting app that it has a black screen would already be helpful. Other people may think that the real-real long-term solution would be to fix our code so it never fails link training or gives black screens :) Some more below: > > Signed-off-by: Todd Previte <tprevite@xxxxxxxxx> > --- > drivers/gpu/drm/i915/intel_ddi.c | 23 +++++++++-- > drivers/gpu/drm/i915/intel_dp.c | 89 +++++++++++++++++++++++++++++----------- > drivers/gpu/drm/i915/intel_drv.h | 7 ++-- > 3 files changed, 90 insertions(+), 29 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c > index ded6013..c0727b8 100644 > --- a/drivers/gpu/drm/i915/intel_ddi.c > +++ b/drivers/gpu/drm/i915/intel_ddi.c > @@ -1246,6 +1246,7 @@ static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder) > struct intel_crtc *crtc = to_intel_crtc(encoder->crtc); > enum port port = intel_ddi_get_encoder_port(intel_encoder); > int type = intel_encoder->type; > + uint8_t fail_code = 0; > > if (crtc->config.has_audio) { > DRM_DEBUG_DRIVER("Audio on pipe %c on DDI\n", > @@ -1274,10 +1275,19 @@ static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder) > intel_dp->DP |= DDI_PORT_WIDTH(intel_dp->lane_count); > > intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON); > - intel_dp_start_link_train(intel_dp); > - intel_dp_complete_link_train(intel_dp); > + if (!intel_dp_start_link_train(intel_dp)) { > + fail_code = 1; > + goto failed; > + } > + if (!intel_dp_complete_link_train(intel_dp)) { > + fail_code = 2; > + goto failed; > + } > if (port != PORT_A) > - intel_dp_stop_link_train(intel_dp); > + if (!intel_dp_stop_link_train(intel_dp)) { > + fail_code = 3; > + goto failed; > + } > } else if (type == INTEL_OUTPUT_HDMI) { > struct intel_hdmi *intel_hdmi = enc_to_intel_hdmi(encoder); > > @@ -1285,6 +1295,13 @@ static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder) > crtc->config.has_hdmi_sink, > &crtc->config.adjusted_mode); > } > + > + return; > + > +failed: > + /* Clear link training here */ > + intel_dp_set_idle_link_train(enc_to_intel_dp(encoder)); > + DRM_DEBUG_KMS("Failed to pre-enable DP, fail code %d\n", fail_code); > } > > static void intel_ddi_post_disable(struct intel_encoder *intel_encoder) > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c > index 88f1bbe..1c6ee34 100644 > --- a/drivers/gpu/drm/i915/intel_dp.c > +++ b/drivers/gpu/drm/i915/intel_dp.c > @@ -2018,23 +2018,42 @@ static void chv_post_disable_dp(struct intel_encoder *encoder) > mutex_unlock(&dev_priv->dpio_lock); > } > > -static void intel_enable_dp(struct intel_encoder *encoder) > +static bool intel_enable_dp(struct intel_encoder *encoder) > { > struct intel_dp *intel_dp = enc_to_intel_dp(&encoder->base); > struct drm_device *dev = encoder->base.dev; > struct drm_i915_private *dev_priv = dev->dev_private; > uint32_t dp_reg = I915_READ(intel_dp->output_reg); > + uint8_t fail_code = 0; > > + /* FIXME: Not sure this needs to be a WARN() */ You can git-blame that line and see its history :) Anyway, the point is that if you reach this point of the code, DP _must_ be disabled, so "dp_reg & DP_POR_EN" tells us there's a bug somewhere else. > if (WARN_ON(dp_reg & DP_PORT_EN)) > - return; > + return false; > > intel_edp_panel_vdd_on(intel_dp); > intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON); > - intel_dp_start_link_train(intel_dp); > + if (!intel_dp_start_link_train(intel_dp)) { > + fail_code = 1; > + goto failed; > + } > intel_edp_panel_on(intel_dp); > edp_panel_vdd_off(intel_dp, true); > - intel_dp_complete_link_train(intel_dp); > - intel_dp_stop_link_train(intel_dp); > + if (!intel_dp_complete_link_train(intel_dp)) { > + fail_code = 2; > + goto failed; > + } > + if (!intel_dp_stop_link_train(intel_dp)) { > + fail_code = 3; > + goto failed; > + } > + > + return true; > + > +failed: > + /* Clear link training here */ > + intel_dp_set_idle_link_train(intel_dp); Function intel_enable_dp() is called by all non-DDI gens, but intel_dp_set_idle_link_train() has an early return for !HAS_DDI. So this function call is basically doing nothing here. IMHO this is the biggest problem with this patch. > + DRM_DEBUG_KMS("Failed to enable DP with code %d\n", fail_code); > + return false; > } > > static void g4x_enable_dp(struct intel_encoder *encoder) > @@ -2956,7 +2975,7 @@ intel_dp_update_link_train(struct intel_dp *intel_dp, uint32_t *DP, > return ret == intel_dp->lane_count; > } > > -static void intel_dp_set_idle_link_train(struct intel_dp *intel_dp) > +void intel_dp_set_idle_link_train(struct intel_dp *intel_dp) > { > struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp); > struct drm_device *dev = intel_dig_port->base.base.dev; > @@ -2988,7 +3007,7 @@ static void intel_dp_set_idle_link_train(struct intel_dp *intel_dp) > } > > /* Enable corresponding port and start training pattern 1 */ > -void > +bool > intel_dp_start_link_train(struct intel_dp *intel_dp) > { > struct drm_encoder *encoder = &dp_to_dig_port(intel_dp)->base.base; > @@ -3007,11 +3026,17 @@ intel_dp_start_link_train(struct intel_dp *intel_dp) > link_config[1] = intel_dp->lane_count; > if (drm_dp_enhanced_frame_cap(intel_dp->dpcd)) > link_config[1] |= DP_LANE_COUNT_ENHANCED_FRAME_EN; > - drm_dp_dpcd_write(&intel_dp->aux, DP_LINK_BW_SET, link_config, 2); > + if (drm_dp_dpcd_write(&intel_dp->aux, DP_LINK_BW_SET, link_config, 2) != 2) { > + DRM_DEBUG_KMS("Failed to write sink DPCD for link rate and lane count\n"); > + goto failed; > + } > > link_config[0] = 0; > link_config[1] = DP_SET_ANSI_8B10B; > - drm_dp_dpcd_write(&intel_dp->aux, DP_DOWNSPREAD_CTRL, link_config, 2); > + if (drm_dp_dpcd_write(&intel_dp->aux, DP_DOWNSPREAD_CTRL, link_config, 2) != 2) { > + DRM_DEBUG_KMS("Failed to write sink DPCD for downspread control\n"); > + goto failed; > + } These two above are useful additions and could probably go into a separate patch. But maybe I'd make them be DRM_ERRORs since we probably want to easily notice them - and get the bug reports. Maybe we could also create a new dp_dpcd_write_safe() function/macro that would be responsible for printing error messages in case we don't transfer all the bits we want, then we could make the whole intel_dp.c file use it. The nice thing of being a macro would be that the DRM_ERROR would print the name of the caller function. > > DP |= DP_PORT_EN; > > @@ -3020,7 +3045,7 @@ intel_dp_start_link_train(struct intel_dp *intel_dp) > DP_TRAINING_PATTERN_1 | > DP_LINK_SCRAMBLING_DISABLE)) { > DRM_ERROR("failed to enable link training\n"); > - return; > + goto failed; > } > > voltage = 0xff; > @@ -3032,12 +3057,12 @@ intel_dp_start_link_train(struct intel_dp *intel_dp) > drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd); > if (!intel_dp_get_link_status(intel_dp, link_status)) { > DRM_ERROR("failed to get link status\n"); > - break; > + goto failed; > } > > if (drm_dp_clock_recovery_ok(link_status, intel_dp->lane_count)) { > DRM_DEBUG_KMS("clock recovery OK\n"); > - break; > + goto cr_done; > } > > /* Check to see if we've tried the max voltage */ > @@ -3048,7 +3073,7 @@ intel_dp_start_link_train(struct intel_dp *intel_dp) > ++loop_tries; > if (loop_tries == 5) { > DRM_ERROR("too many full retries, give up\n"); > - break; > + goto failed; > } > intel_dp_reset_link_train(intel_dp, &DP, > DP_TRAINING_PATTERN_1 | > @@ -3062,7 +3087,7 @@ intel_dp_start_link_train(struct intel_dp *intel_dp) > ++voltage_tries; > if (voltage_tries == 5) { > DRM_ERROR("too many voltage retries, give up\n"); > - break; > + goto failed; > } > } else > voltage_tries = 0; > @@ -3071,14 +3096,20 @@ intel_dp_start_link_train(struct intel_dp *intel_dp) > /* Update training set as requested by target */ > if (!intel_dp_update_link_train(intel_dp, &DP, link_status)) { > DRM_ERROR("failed to update link training\n"); > - break; > + goto failed; > } > } > > +cr_done: > intel_dp->DP = DP; > + return true; > + > +failed: > + DRM_DEBUG_KMS("Failed to initiate link training\n"); > + return false; This set of changes where you replace "break" with "goto"s should be on a separate patch, with a nice explanation of what are the consequences of not doing "intel_dp->DP = DP" on the cases were we just "goto failed". If the link training failed, we should probably disable DP_PORT_EN. > } > > -void > +bool > intel_dp_complete_link_train(struct intel_dp *intel_dp) > { > bool channel_eq = false; > @@ -3095,7 +3126,7 @@ intel_dp_complete_link_train(struct intel_dp *intel_dp) > training_pattern | > DP_LINK_SCRAMBLING_DISABLE)) { > DRM_ERROR("failed to start channel equalization\n"); > - return; > + return false; > } > > tries = 0; > @@ -3154,14 +3185,17 @@ intel_dp_complete_link_train(struct intel_dp *intel_dp) > > intel_dp->DP = DP; > > - if (channel_eq) > + if (channel_eq) { > DRM_DEBUG_KMS("Channel EQ done. DP Training successful\n"); > + return true; > + } > > + return false; > } Same here. > > -void intel_dp_stop_link_train(struct intel_dp *intel_dp) > +bool intel_dp_stop_link_train(struct intel_dp *intel_dp) > { > - intel_dp_set_link_train(intel_dp, &intel_dp->DP, > + return intel_dp_set_link_train(intel_dp, &intel_dp->DP, > DP_TRAINING_PATTERN_DISABLE); > } > > @@ -3600,9 +3634,18 @@ intel_dp_check_link_status(struct intel_dp *intel_dp) > if (!drm_dp_channel_eq_ok(link_status, intel_dp->lane_count)) { > DRM_DEBUG_KMS("%s: channel EQ not ok, retraining\n", > intel_encoder->base.name); > - intel_dp_start_link_train(intel_dp); > - intel_dp_complete_link_train(intel_dp); > - intel_dp_stop_link_train(intel_dp); > + if (!intel_dp_start_link_train(intel_dp)) { > + DRM_DEBUG_KMS("Start link training failed\n"); > + return; > + } > + if (!intel_dp_complete_link_train(intel_dp)) { > + DRM_DEBUG_KMS("Complete link training failed\n"); > + return; > + } > + if (!intel_dp_stop_link_train(intel_dp)) { > + DRM_DEBUG_KMS("Stop link training failed\n"); > + return; > + } > } > } > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h > index d2ae54f..79876df 100644 > --- a/drivers/gpu/drm/i915/intel_drv.h > +++ b/drivers/gpu/drm/i915/intel_drv.h > @@ -844,9 +844,10 @@ int intel_dp_set_config(struct drm_mode_set *set); > void intel_dp_init(struct drm_device *dev, int output_reg, enum port port); > bool intel_dp_init_connector(struct intel_digital_port *intel_dig_port, > struct intel_connector *intel_connector); > -void intel_dp_start_link_train(struct intel_dp *intel_dp); > -void intel_dp_complete_link_train(struct intel_dp *intel_dp); > -void intel_dp_stop_link_train(struct intel_dp *intel_dp); > +bool intel_dp_start_link_train(struct intel_dp *intel_dp); > +bool intel_dp_complete_link_train(struct intel_dp *intel_dp); > +bool intel_dp_stop_link_train(struct intel_dp *intel_dp); > +void intel_dp_set_idle_link_train(struct intel_dp *intel_dp); > void intel_dp_sink_dpms(struct intel_dp *intel_dp, int mode); > void intel_dp_encoder_destroy(struct drm_encoder *encoder); > void intel_dp_check_link_status(struct intel_dp *intel_dp); > -- > 1.9.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Paulo Zanoni _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx