Re: Buiild error in i915/xe

Guenter Roeck <linux@xxxxxxxxxxxx> · Mon, 20 Jan 2025 06:15:30 -0800

On 1/20/25 03:21, Jani Nikula wrote:
On Mon, 20 Jan 2025, David Laight <david.laight.linux@xxxxxxxxx> wrote:
On Mon, 20 Jan 2025 12:48:11 +0200
Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> wrote:

On Sun, 19 Jan 2025, David Laight <david.laight.linux@xxxxxxxxx> wrote:
On Sat, 18 Jan 2025 14:58:48 -0800
Guenter Roeck <linux@xxxxxxxxxxxx> wrote:

On 1/18/25 14:11, David Laight wrote:
On Sat, 18 Jan 2025 13:21:39 -0800
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

On Sat, 18 Jan 2025 at 09:49, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:

No idea why the compiler would know that the values are invalid.

It's not that the compiler knows tat they are invalid, but I bet what
happens is in scale() (and possibly other places that do similar
checks), which does this:

          WARN_ON(source_min > source_max);
          ...
          source_val = clamp(source_val, source_min, source_max);

and the compiler notices that the ordering comparison in the first
WARN_ON() is the same as the one in clamp(), so it basically converts
the logic to

          if (source_min > source_max) {
                  WARN(..);
                  /* Do the clamp() knowing that source_min > source_max */
                  source_val = clamp(source_val, source_min, source_max);
          } else {
                  /* Do the clamp knowing that source_min <= source_max */
                  source_val = clamp(source_val, source_min, source_max);
          }

(obviously I dropped the other WARN_ON in the conversion, it wasn't
relevant for this case).

And now that first clamp() case is done with source_min > source_max,
and it triggers that build error because that's invalid.

So the condition is not statically true in the *source* code, but in
the "I have moved code around to combine tests" case it now *is*
statically true as far as the compiler is concerned.

Well spotted :-)

One option would be to move the WARN_ON() below the clamp() and
add an OPTIMISER_HIDE_VAR(source_max) between them.

Or do something more sensible than the WARN().
Perhaps return target_min on any such errors?

This helps:

-       WARN_ON(source_min > source_max);
-       WARN_ON(target_min > target_max);
-
          /* defensive */
          source_val = clamp(source_val, source_min, source_max);

+       WARN_ON(source_min > source_max);
+       WARN_ON(target_min > target_max);

That is a 'quick fix' ...

Much better would be to replace the WARN() with (say):
	if (target_min >= target_max)
		return target_min;
	if (source_min >= source_max)
		return target_min + (target_max - target_min)/2;
So that the return values are actually in range (in as much as one is defined).
Note that the >= cpmparisons also remove a divide by zero.

I want the loud and early warnings for clear bugs instead of
"gracefully" silencing the errors only to be found through debugging
user reports.

A user isn't going to notice a WARN() - not until you tell them to look for it.
In any case even if you output a message you really want to return a 'sane'
value, who knows what effect a very out of range value is going to have.

The point is, we'll catch the WARN in CI before it goes out to users.

It isn't going to catch the divide by 0 error, and it obviously doesn't
catch the build problem on parisc with gcc 13.x because the CI isn't
testing it.

How about disabling DRM_XE on architectures where it isn't supported,
matching DRM_I915 ?

Thanks,
Guenter