On 20.05.2022 10:30, Chuck Zmudzinski wrote: > On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: >> On 5/20/2022 2:05 AM, Jan Beulich wrote: >>> On 20.05.2022 06:43, Chuck Zmudzinski wrote: >>>> On 5/4/22 5:14 AM, Juergen Gross wrote: >>>>> On 04.05.22 10:31, Jan Beulich wrote: >>>>>> On 03.05.2022 15:22, Juergen Gross wrote: >>>>>> >>>>>> ... these uses there are several more. You say nothing on why >>>>>> those want >>>>>> leaving unaltered. When preparing my earlier patch I did inspect them >>>>>> and came to the conclusion that these all would also better >>>>>> observe the >>>>>> adjusted behavior (or else I couldn't have left pat_enabled() as the >>>>>> only predicate). In fact, as said in the description of my earlier >>>>>> patch, in >>>>>> my debugging I did find the use in i915_gem_object_pin_map() to be >>>>>> the >>>>>> problematic one, which you leave alone. >>>>> Oh, I missed that one, sorry. >>>> That is why your patch would not fix my Haswell unless >>>> it also touches i915_gem_object_pin_map() in >>>> drivers/gpu/drm/i915/gem/i915_gem_pages.c >>>> >>>>> I wanted to be rather defensive in my changes, but I agree at least >>>>> the >>>>> case in arch_phys_wc_add() might want to be changed, too. >>>> I think your approach needs to be more aggressive so it will fix >>>> all the known false negatives introduced by bdd8b6c98239 >>>> such as the one in i915_gem_object_pin_map(). >>>> >>>> I looked at Jan's approach and I think it would fix the issue >>>> with my Haswell as long as I don't use the nopat option. I >>>> really don't have a strong opinion on that question, but I >>>> think the nopat option as a Linux kernel option, as opposed >>>> to a hypervisor option, should only affect the kernel, and >>>> if the hypervisor provides the pat feature, then the kernel >>>> should not override that, >>> Hmm, why would the kernel not be allowed to override that? Such >>> an override would affect only the single domain where the >>> kernel runs; other domains could take their own decisions. >>> >>> Also, for the sake of completeness: "nopat" used when running on >>> bare metal has the same bad effect on system boot, so there >>> pretty clearly is an error cleanup issue in the i915 driver. But >>> that's orthogonal, and I expect the maintainers may not even care >>> (but tell us "don't do that then"). > > Actually I just did a test with the last official Debian kernel > build of Linux 5.16, that is, a kernel before bdd8b6c98239 was > applied. In fact, the nopat option does *not* break the i915 driver > in 5.16. That is, with the nopat option, the i915 driver loads > normally on both the bare metal and on the Xen hypervisor. > That means your presumption (and the presumption of > the author of bdd8b6c98239) that the "nopat" option was > being observed by the i915 driver is incorrect. Setting "nopat" > had no effect on my system with Linux 5.16. So after doing these > tests, I am against the aggressive approach of breaking the i915 > driver with the "nopat" option because prior to bdd8b6c98239, > nopat did not break the i915 driver. Why break it now? Because that's, in my understanding, is the purpose of "nopat" (not breaking the driver of course - that's a driver bug -, but having an effect on the driver). Jan