Re: [Intel-gfx] [PATCH 4/9] drm/i915: Add check for corrupt raw EDID header for Displayport compliance testing

Todd Previte <tprevite@xxxxxxxxx> · Fri, 10 Apr 2015 07:44:20 -0700

On 4/8/2015 3:37 PM, Paulo Zanoni wrote:
2015-04-08 18:43 GMT-03:00 Todd Previte<tprevite@xxxxxxxxx>:
On 4/8/2015 9:51 AM, Paulo Zanoni wrote:
2015-03-31 14:15 GMT-03:00 Todd Previte<tprevite@xxxxxxxxx>:
Displayport compliance test 4.2.2.6 requires that a source device be
capable of detecting
a corrupt EDID. To do this, the test sets up an invalid EDID header to be
read by the source
device. Unfortunately, the DRM EDID reading and parsing functions are
actually too good in
this case and prevent the source from reading the corrupted EDID. The
result is a failed
compliance test.

In order to successfully pass the test, the raw EDID header must be
checked on each read
to see if has been "corrupted". If an invalid raw header is detected, a
flag is set that
allows the compliance testing code to acknowledge that fact and react
appropriately. The
flag is automatically cleared on read.

This code is designed to expressly work for compliance testing without
disrupting normal
operations for EDID reading and parsing.

Signed-off-by: Todd Previte<tprevite@xxxxxxxxx>
Cc:dri-devel@xxxxxxxxxxxxxxxxxxxxx
---
   drivers/gpu/drm/drm_edid.c       | 33 +++++++++++++++++++++++++++++++++
   drivers/gpu/drm/i915/intel_dp.c  | 17 +++++++++++++++++
   drivers/gpu/drm/i915/intel_drv.h |  1 +
   include/drm/drm_edid.h           |  5 +++++
   4 files changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 53bc7a6..3d4f473 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -990,6 +990,32 @@ static const u8 edid_header[] = {
          0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00
   };

+
+/* Flag for EDID corruption testing
+ * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
+ */
+static bool raw_edid_header_corrupted;
A static variable like this is not a good design, especially for a
module like drm.ko. If you really need this, please store it inside
some struct. But see below first.
Per our discussion this morning, I concur. This has been removed in favor of
a different solution that uses a new boolean flag in the drm_connector
struct.

Capturing more of the discussion here, the static boolean was a bad idea to
begin with and needed to be removed. One solution was to make the flag
non-static and non-clear-on-read, then add a separate clear() function. But
it still had the problem of potential misuse other places in the code. The
current solution (which will be posted with V5) modifies the is_valid()
function and adds a flag in the drm_connector struct that can be used to
detect this low-level header corruption.


+
+/**
+ * drm_raw_edid_header_valid - check to see if the raw header is
+ * corrupt or not. Used solely for Displayport compliance
+ * testing and required by Link CTS Core 1.2 rev1.1 4.2.2.6.
+ * @raw_edid: pointer to raw base EDID block
+ *
+ * Indicates whether the original EDID header as read from the
+ * device was corrupt or not. Clears on read.
+ *
+ * Return: true if the raw header was corrupt, otherwise false
+ */
+bool drm_raw_edid_header_corrupt(void)
+{
+       bool corrupted = raw_edid_header_corrupted;
+
+       raw_edid_header_corrupted = 0;
+       return corrupted;
+}
+EXPORT_SYMBOL(drm_raw_edid_header_corrupt);
+
   /**
    * drm_edid_header_is_valid - sanity check the header of the base EDID
block
    * @raw_edid: pointer to raw base EDID block
@@ -1006,6 +1032,13 @@ int drm_edid_header_is_valid(const u8 *raw_edid)
                  if (raw_edid[i] == edid_header[i])
                          score++;

+       if (score != 8) {
+               /* Log and set flag here for EDID corruption testing
+                * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
+                */
+               DRM_DEBUG_DRIVER("Raw EDID header invalid\n");
+               raw_edid_header_corrupted = 1;
+       }
The problem is that here we're limiting ourselves to just a bad edid
header, not a bad edid in general, so there are many things which we
might not get - such as a simple wrong checksum edid value. I remember
that on the previous patch you calculated the whole checksum manually,
but I don't see that code anymore. What was the reason for the change?
So this code is specifically for the 4.2.2.6 compliance test that is looking
for nothing more than an invalid EDID header.
On the version of the spec I have (1.2 Core, Aug 22 2011), 4.2.2.6 is
"EDID Corruption Detection", and it mentions "EDID corruption" without
really getting into the details of header corruption. On the "Test
procedure" description, it mentions "Reference Sink sets up EDID with
incorrect checksum", which we don't check. Of course, changing the
header may produce an incorrect checksum, but maybe the wrong header
is just a particular detail of the compliance testing device you have,
while others could potentially have other forms of corruption, such as
just a bad checksum?
It could very well be particular this unit. So with a different test 
device, we might be able to get away with just checking the checksum. 
For this one, however, we don't appear to have that option. I added the 
checksum computation into the header fixup code just to make sure.

In the paragraphs below you elaborate even more on the assumption of a
bad header instead of just a bad checksum, so maybe we have different
versions of the spec? (I still remember when I used version 1.0 of a
certain non-backwards-compatible spec to review a patch made against
version 0.8 of the same spec)
I do have a later version of the spec, but description of this test 
seems to be the same between the two.
In fact, the test unit only
sets that header as invalid once, so if you miss it on the first read, you
can't go back and check it again later - the test will now fail. So catching
the general case isn't really what this is about - it's about being able to
detect a corrupt EDID header even if it only happens once.

Honestly, the DRM EDID code is VERY good about catching corruption cases and
in the case of corrupted headers, fixing them and moving on. I had to tie
into it at a fairly low level in order to catch the invalid header before
the code fixed it.

With respect to the checksum code, for quite a while the checksum
computation was incorrect in the DRM code. Somewhere along in November of
last year or 2013 (I remember the month, not the year, go figure) someone
came along and added a checksum computation that actually worked. So that
rendered that original code I wrote unnecessary.

Also, while reviewing the patch I just discovered
connector->bad_edid_counter. Can't we just use it instead of this
patch? I mean: grab the current counter, check edid, see if the
counter moved.
I think the above description highlights why using this counter really isn't
an option. Since the code only gets one shot at catching that invalid
header, it's essential to make sure it's captured specifically. Comparing
before and after values of this counter doesn't specifically say that the
header was invalid, only that SOMEthing in the EDID was invalid.
Which is, according to the way I read the spec, not a problem.
I completely agree with you. Unfortunately, coding directly to the spec 
isn't enough in this case.


          return score;
   }
   EXPORT_SYMBOL(drm_edid_header_is_valid);
diff --git a/drivers/gpu/drm/i915/intel_dp.c
b/drivers/gpu/drm/i915/intel_dp.c
index dc87276..57f8e43 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -3824,6 +3824,9 @@ update_status:
                                     &response, 1);
          if (status <= 0)
                  DRM_DEBUG_KMS("Could not write test response to
sink\n");
+
+       /* Clear flag here, after testing is complete*/
+       intel_dp->compliance_edid_invalid = 0;
   }

   static int
@@ -3896,6 +3899,10 @@ intel_dp_check_link_status(struct intel_dp
*intel_dp)
   {
          struct drm_device *dev = intel_dp_to_dev(intel_dp);
          struct intel_encoder *intel_encoder =
&dp_to_dig_port(intel_dp)->base;
+       struct drm_connector *connector =
&intel_dp->attached_connector->base;
+       struct i2c_adapter *adapter = &intel_dp->aux.ddc;
+       struct edid *edid_read = NULL;
+
          u8 sink_irq_vector;
          u8 link_status[DP_LINK_STATUS_SIZE];

@@ -3912,6 +3919,16 @@ intel_dp_check_link_status(struct intel_dp
*intel_dp)
                  return;
          }

+       /* Compliance testing requires an EDID read for all HPD events
+        * Link CTS Core 1.2 rev 1.1: Test 4.2.2.1
+        * Flag set here will be handled in the EDID test function
+        */
+       edid_read = drm_get_edid(connector, adapter);
+       if (!edid_read || drm_raw_edid_header_corrupt() == 1) {
+               DRM_DEBUG_DRIVER("EDID invalid, setting flag\n");
+               intel_dp->compliance_edid_invalid = 1;
+       }
I see that on the next patch you also add a drm_get_edid() call, so we
have apparently added 2 calls for the edid test. Do we really need
both? Why is this one needed? Why is that one needed?
So there's two issues here - first is the same one mentioned above, catching
that single instance of a corrupted EDID header. The second is that the
checksum from the test device differs between the two reads. If you remove
either one of them, one test or the other will fail.
But then why not keep both at the same place? The one here is going to
affect a lot more than just compliance testing, while the other is
contained to DP compliance code.
I was able to find a solution that removed the duplicate EDID read. I 
had to add a checksum storage variable in the intel_dp struct, but 
that's infinitely better than having another EDID read.

Unfortunately though, the one that has to say is in the 
check_link_status. There's just no way around it because of the test 
4.2.2.1 that requires it to happen for a hot plug event. There's no test 
request bit set for that, or any other indicator. It simply has to 
happen for every HPD plug event.

Also, some more ideas:

I also thought that we already automatically issued get_edid() calls
on the normal hotplug code path, so it would be a "third" call on the
codepath for the test. Can't we just rely on this one?
Same issue as above.
Another idea would be: instead of getting the edid from inside the
Kernel, we could try to get it from the user-space, using the
GetResources/GetConnector IOCTLs, and also maybe look at the EDID
properties to possibly validate the EDID (in case that edid did not
get "fixed" by the Kernel). The nice thing about this is that it would
make the test be more like a real driver usage. Do you see any
possible problems with this approach?
I don't really see this as a valid option in light of the descriptions I've
given above. This has a good chance of introducing latency problems which
may adversely affect the tests as well.
We have 5 seconds, that's way more than enough.
The test has a 5 second timeout for the entire operation. I'm less 
concerned with timing out and more concerned about not being able to 
catch things fast enough or react fast enough to parameter or value 
changes. It may or may not be an issue for processing the EDID (I'd lean 
more towards the not case) but it's something that has to be kept in 
mind here, as this has caused problems in the past when building out the 
test interfaces.

In any case, this sounds like this is a suggestion rather than a 
blocking issue. My main concern with moving all this stuff into 
userspace is that it's moving towards building a Displayport-compliant 
user app versus a Displayport-compliant driver. But this is something 
that I can look into sometime down the road.


+
          /* Try to read the source of the interrupt */
          if (intel_dp->dpcd[DP_DPCD_REV] >= 0x11 &&
              intel_dp_get_sink_irq(intel_dp, &sink_irq_vector)) {
diff --git a/drivers/gpu/drm/i915/intel_drv.h
b/drivers/gpu/drm/i915/intel_drv.h
index e7b62be..42e4251 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -651,6 +651,7 @@ struct intel_dp {
          /* Displayport compliance testing */
          unsigned long compliance_test_type;
          bool compliance_testing_active;
+       bool compliance_edid_invalid;
   };

   struct intel_digital_port {
diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
index 87d85e8..8a7eb22 100644
--- a/include/drm/drm_edid.h
+++ b/include/drm/drm_edid.h
@@ -388,4 +388,9 @@ struct edid *drm_do_get_edid(struct drm_connector
*connector,
                                size_t len),
          void *data);

+/* Check for corruption in raw EDID header - Displayport compliance
+  * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
+ */
+bool drm_raw_edid_header_corrupt(void);
+
   #endif /* __DRM_EDID_H__ */
--
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx



_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel