On 12/3/22 18:27, Ashish Gupta (SJC) wrote:
As I have ready setup to test this out, I can patch these missing 3
missing patches and test.
I would like to understand if there are any other patches (from other
author) which you think would be needed as well.
There shouldn't be any others, on the other hand they probably do not
apply right away otherwise Greg would have included them. I'm not sure
if the backport will be simpler if more patches are added, or if the
required changes are trivial.
Paolo
Please let me know.
I will start with your patches first.
Regards,
--Ashish Gupta
*From: *Paolo Bonzini <pbonzini@xxxxxxxxxx>
*Date: *Friday, December 2, 2022 at 1:39 PM
*To: *Ashish Gupta (SJC) <ashish.gupta1@xxxxxxxxxxx>
*Cc: *Suresh Gumpula <suresh.gumpula@xxxxxxxxxxx>, Felipe Franciosi
<felipe@xxxxxxxxxxx>, kvm <kvm@xxxxxxxxxxxxxxx>, Sean Christopherson
<seanjc@xxxxxxxxxx>, John Levon <john.levon@xxxxxxxxxxx>, Bijan
Mottahedeh <bijan.mottahedeh@xxxxxxxxxxx>, Eiichi Tsukata
<eiichi.tsukata@xxxxxxxxxxx>
*Subject: *Re: Nvidia GPU PCI passthrough and kernel commit
#5f33887a36824f1e906863460535be5d841a4364
Yes, I think so. Are you going to test a backport of the three missing
patches or would you like me to prepare it?
Thanks for the report and the tests!
Paolo
Il ven 2 dic 2022, 20:59 Ashish Gupta (SJC) <ashish.gupta1@xxxxxxxxxxx
<mailto:ashish.gupta1@xxxxxxxxxxx>> ha scritto:
Thanks Paolo,
> All four patches were marked as stable, but it looks like the first
> three did not apply and therefore are not part of 5.10.
Sounds like subset of changes are committed (backported) to 5.10.x
kernel and some are not.
Wouldn’t that make 5.10.x kernel unstable for this kind of issue?
Do you think, we should backport all those relevant changes in
stable branch like 5.10.x including patches from other authors also
around this area?
Regards,
--Ashish Gupta
*From: *Paolo Bonzini <pbonzini@xxxxxxxxxx <mailto:pbonzini@xxxxxxxxxx>>
*Date: *Thursday, December 1, 2022 at 5:16 PM
*To: *Ashish Gupta (SJC) <ashish.gupta1@xxxxxxxxxxx
<mailto:ashish.gupta1@xxxxxxxxxxx>>, Suresh Gumpula
<suresh.gumpula@xxxxxxxxxxx <mailto:suresh.gumpula@xxxxxxxxxxx>>,
Felipe Franciosi <felipe@xxxxxxxxxxx <mailto:felipe@xxxxxxxxxxx>>
*Cc: *kvm@xxxxxxxxxxxxxxx <mailto:kvm@xxxxxxxxxxxxxxx>
<kvm@xxxxxxxxxxxxxxx <mailto:kvm@xxxxxxxxxxxxxxx>>,
seanjc@xxxxxxxxxx <mailto:seanjc@xxxxxxxxxx> <seanjc@xxxxxxxxxx
<mailto:seanjc@xxxxxxxxxx>>, John Levon <john.levon@xxxxxxxxxxx
<mailto:john.levon@xxxxxxxxxxx>>, Bijan Mottahedeh
<bijan.mottahedeh@xxxxxxxxxxx
<mailto:bijan.mottahedeh@xxxxxxxxxxx>>, Eiichi Tsukata
<eiichi.tsukata@xxxxxxxxxxx <mailto:eiichi.tsukata@xxxxxxxxxxx>>
*Subject: *Re: Nvidia GPU PCI passthrough and kernel commit
#5f33887a36824f1e906863460535be5d841a4364
On 12/2/22 01:29, Ashish Gupta (SJC) wrote:
> Hi Paolo,
>
> While we were accessing code change done by commit :
> 5f33887a36824f1e906863460535be5d841a4364
>
> Bijan, noticed following:
>
> From the changed code in commit #
> 5f33887a36824f1e906863460535be5d841a4364 , we see that the following check
>
> !kvm_vcpu_apicv_active(vcpu)*/)/*
>
> has been removed, so in fact the new code is basically assuming that
> apicv is always active.
Right, instead it checks irqchip_in_kernel(kvm) && enable_apicv. This
is documented in the commit message:
However, these checks do not attempt to synchronize with
changes to
the IRTE. In particular, there is no path that updates the IRTE
when APICv is re-activated on vCPU 0; and there is no path to
wakeup
a CPU that has APICv disabled, if the wakeup occurs because of an
IRTE that points to a posted interrupt.
The full series is at
https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_20211123004311.2954158-2D2-2Dpbonzini-40redhat.com_T_&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NSViKyfbZLLlRE5iJBGkhRVXJKqWdgMN8wGfv1tfc2E&m=iEB57vPMXHVPBeayAOwoHp32BcSlX-J5ig4nd4bnfDs1XqL3ykppJ1b1qVu9cuz_&s=nlSZ4vVygCrPKCaCRjJWrVFphM6Pym_iVYc-fBbjrc4&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_20211123004311.2954158-2D2-2Dpbonzini-40redhat.com_T_&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NSViKyfbZLLlRE5iJBGkhRVXJKqWdgMN8wGfv1tfc2E&m=iEB57vPMXHVPBeayAOwoHp32BcSlX-J5ig4nd4bnfDs1XqL3ykppJ1b1qVu9cuz_&s=nlSZ4vVygCrPKCaCRjJWrVFphM6Pym_iVYc-fBbjrc4&e=>
and has more details:
Now that APICv can be disabled per-CPU (depending on whether
it has
some setup that is incompatible) we need to deal with guests
having
a mix of vCPUs with enabled/disabled posted interrupts. For
assigned devices, their posted interrupt configuration must be the
same across the whole VM, so handle posted interrupts by hand on
vCPUs with disabled posted interrupts.
All four patches were marked as stable, but it looks like the first
three did not apply and therefore are not part of 5.10.
78311a514099932cd8434d5d2194aa94e56ab67c
KVM: x86: ignore APICv if LAPIC is not enabled
7e1901f6c86c896acff6609e0176f93f756d8b2a
KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled
37c4dbf337c5c2cdb24365ffae6ed70ac1e74d7a
KVM: x86: check PIR even for vCPUs with disabled APICv
The three commits do not have any subsequent commit that Fixes them.
> The latest upstream code however seems to disable apicv conditionally
> depending on if it is actually being used:
Right.
> We found that, once we disable hyperv benightment for Linux vm,
> everything is working fine (on v5.10.84)
>
> Further Eiichi noticed, that your change were introduced in 5.16 and
> backported to 5.10.84.
>
> On the other hand, Vitaly's patch (commit
> #0f250a646382e017725001a552624be0c86527bf) was introduced in 5.15 and
> NOT backported to 5.10.X.
>
> Should we backport Vitaly's patch to stable 5.10.X? Do you think that
> will solve issue what we are facing?
As you found out there are a lot of dependent changes to introduce
__kvm_request_apicv_update so it's not really feasible.
Paolo