On 13.07.2022 13:10, Chuck Zmudzinski wrote: > On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote: >> On 7/13/2022 5:09 AM, Jan Beulich wrote: >>> On 13.07.2022 10:51, Chuck Zmudzinski wrote: >>>> On 7/13/22 2:18 AM, Jan Beulich wrote: >>>>> On 13.07.2022 03:36, Chuck Zmudzinski wrote: >>>>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) >>>>>> *Add the necessary code to incorporate the "nopat" fix >>>>>> *void init_cache_modes(void) -> void __init init_cache_modes(void) >>>>>> *Add Jan Beulich as Co-developer (Jan has not signed off yet) >>>>>> *Expand the commit message to include relevant parts of the commit >>>>>> message of Jan Beulich's proposed patch for this problem >>>>>> *Fix 'else if ... {' placement and indentation >>>>>> *Remove indication the backport to stable branches is only back to 5.17.y >>>>>> >>>>>> I think these changes address all the comments on the original patch >>>>>> >>>>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to >>>>>> include Jan's idea for fixing "nopat" that was missing from the first >>>>>> version of the patch. >>>>> >>>>> You've sufficiently altered this change to clearly no longer want my >>>>> S-o-b; unfortunately in fact I think you broke things: >>>> >>>> Well, I hope we can come to an agreement so I have >>>> your S-o-b. But that would probably require me to remove >>>> Juergen's R-b. >>>> >>>>>> @@ -292,7 +294,7 @@ void init_cache_modes(void) >>>>>> rdmsrl(MSR_IA32_CR_PAT, pat); >>>>>> } >>>>>> >>>>>> - if (!pat) { >>>>>> + if (!pat || pat_force_disabled) { >>>>> >>>>> By checking the new variable here ... >>>>> >>>>>> /* >>>>>> * No PAT. Emulate the PAT table that corresponds to the two >>>>>> * cache bits, PWT (Write Through) and PCD (Cache Disable). >>>>>> @@ -313,6 +315,16 @@ void init_cache_modes(void) >>>>>> */ >>>>>> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | >>>>>> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC); >>>>> >>>>> ... you put in place a software view which doesn't match hardware. I >>>>> continue to think that ... >>>>> >>>>>> + } else if (!pat_bp_enabled) { >>>>> >>>>> ... the variable wants checking here instead (at which point, yes, >>>>> this comes quite close to simply being a v2 of my original patch). >>>>> >>>>> By using !pat_bp_enabled here you actually broaden where the change >>>>> would take effect. Iirc Boris had asked to narrow things (besides >>>>> voicing opposition to this approach altogether). Even without that >>>>> request I wonder whether you aren't going to far with this. >>>>> >>>>> Jan >>>> >>>> I thought about checking for the administrator's "nopat" >>>> setting where you suggest which would limit the effect >>>> of "nopat" to not reporting PAT as enabled to device >>>> drivers who query for PAT availability using pat_enabled(). >>>> The main reason I did not do that is that due to the fact >>>> that we cannot write to the PAT MSR, we cannot really >>>> disable PAT. But we come closer to respecting the wishes >>>> of the administrator by configuring the caching modes as >>>> if PAT is actually disabled by the hardware or firmware >>>> when in fact it is not. >>>> >>>> What would you propose logging as a message when >>>> we report PAT as disabled via pat_enabled()? The main >>>> reason I did not choose to check the new variable in the >>>> new 'else if' block is that I could not figure out what to >>>> tell the administrator in that case. I think we would have >>>> to log something like, "nopat is set, but we cannot disable >>>> PAT, doing our best to disable PAT by not reporting PAT >>>> as enabled via pat_enabled(), but that does not guarantee >>>> that kernel drivers and components cannot use PAT if they >>>> query for PAT support using boot_cpu_has(X86_FEATURE_PAT) >>>> instead of pat_enabled()." However, I acknowledge WC mappings >>>> would still be disabled because arch_can_pci_mmap_wc() will >>>> be false if pat_enabled() is false. >>>> >>>> Perhaps we also need to log something if we keep the >>>> check for "nopat" where I placed it. We could say something >>>> like: "nopat is set, but we cannot disable hardware/firmware >>>> PAT support, so we are emulating as if there is no PAT support >>>> which puts in place a software view that does not match >>>> hardware." >>>> >>>> No matter what, because we cannot write to PAT MSR in >>>> the Xen PV case, we probably need to log something to >>>> explain the problems associated with trying to honor the >>>> administrator's request. Also, what log level should it be. >>>> Should it be a pr_warn instead of a pr_info? >>> >>> I'm afraid I'm the wrong one to answer logging questions. As you >>> can see from my original patch, I didn't add any new logging (and >>> no addition was requested in the comments that I have got). I also >>> don't think "nopat" has ever meant "disable PAT", as the feature >>> is either there or not. Instead I think it was always seen as >>> "disable fiddling with PAT", which by implication means using >>> whatever is there (if the feature / MSR itself is available). >> >> IIRC, I do think I mentioned in the comments on your patch that >> it would be preferable to mention in the commit message that >> your patch would change the current behavior of "nopat" on >> Xen. The question is, how much do we want to change the >> current behavior of "nopat" on Xen. I think if we have to change >> the current behavior of "nopat" on Xen and if we are going >> to propagate that change to all current stable branches all >> the way back to 4.9.y,, we better make a lot of noise about >> what we are doing here. >> >> Chuck > > And in addition, if we are going to backport this patch to > all current stable branches, we better have a really, really, > good reason for changing the behavior of "nopat" on Xen. > > Does such a reason exist? Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo. Jan > Or perhaps, Juergen, could you > accept a v3 of my patch that does not need to decide > how to backport the change to "nopat" to the stable branches > that are affected by the current behavior of "nopat" on Xen? > > To do such a v3, I would just have to fix the style problems > with my original patch and not come to an agreement with > Jan about how to deal with the "nopat" problem. > > Chuck > > Chuck