On 10/18/2011 11:14, Jean Delvare wrote: > Hi Dave, Eugeni, Alex, > > On Tue, 18 Oct 2011 11:02:00 +0100, Dave Airlie wrote: >>> This allows to avoid talking to a non-existent bus repeatedly until we >>> finally timeout. The non-existent bus is signalled by -ENXIO error, >>> provided by i2c_algo_bit:bit_doAddress call. >>> >>> As the advantage of such change, all the other routines which use >>> drm_get_edid would benefit for this timeout. >>> >>> This change should fix >>> https://bugs.freedesktop.org/show_bug.cgi?id=41059 and improve overall >>> edid detection timing by 10-30% in most cases, and by a much larger margin >>> in case of phantom outputs. >> >> Jean, Alex, >> >> I'm thinking of thowing this into -next, any objections? > > This seems to be the wrong place for the test. The code below is really > testing for non-responsive (possibly not present) EDID, NOT > "non-existent adapter". Non-existent adapter should be checked in the > firmware if possible, or failing that, by testing the bus lines at bus > creation time, as i2c_algo_bit.bit_test=1 is doing. While this setting > has been known to cause trouble recently (not per se but because it was > triggering a bug somewhere else in the radeon driver), it might still > have value, and could be changed to a per-adapter setting by exporting > the test function (I have a patch ready doing exactly this) and letting > video drivers test their I2C buses and discard the non-working ones if > they want. > > I am worried that the patch below will cause regressions on other > machines. There's a comment right before the loop which reads: > > /* The core i2c driver will automatically retry the transfer if the > * adapter reports EAGAIN. However, we find that bit-banging transfers > * are susceptible to errors under a heavily loaded machine and > * generate spurious NAKs and timeouts. Retrying the transfer > * of the individual block a few times seems to overcome this. > */ > > So the retries are there for a reason, and -ENXIO is exactly what you > get on spurious NAKs. > Hi Jean, You are right about the bit_test=1 testing, my initial attempt at the fix did exactly that (https://bugs.freedesktop.org/show_bug.cgi?id=41059, comments 14 and 15). The problem is that for some buses, namely HDMI ones in my testing, bit_test-like tests always consider them as non-existent when the connectivity is not setup; and as working when it is. So in any case, we could not just blacklist them - when they do, they are gone for good, and won't work anymore, and we need to keep re-trying them at each attempt. And in case of continuous pre-testing for the stuck bits and like when driver attempts to get the edid (for example, when xrandr is run), we still hit the same issue - the drm_do_probe_ddc_edid will continue to retry the attempts until it reaches the maximum number of retries. E.g., there is no way to tell drm_do_probe_ddc_edid to treat any return code as a permanent failure and make it give up faster. It could be fixed either in per-driver fashion, like I did with the other patch (which also tests for -ENXIO); or in a generic way, in DRM. I couldn't reproduce any false positives coming from -ENXIO on i915 driver, but perhaps it is easier with radeon? Do you have any specific workload which trick the driver into generating spurious NAKs by a chance? > One thing which may make sense would be to make the retry count a > module parameter, instead of a hard-coded value, so that users can > lower the value if they care more about boot time than reliability. But > again, ideally problematic buses would not have been created in the > first place. Or perhaps it would be possible to have any error code coming from i2c_transfer to signal a permanent error, which should not be retried.. what do you think? -- Eugeni Dodonov _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel