On 12/20/22 12:24 PM, Halil Pasic wrote:
On Tue, 20 Dec 2022 09:33:03 -0500
Anthony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
On 12/19/22 9:10 AM, Halil Pasic wrote:
On Tue, 13 Dec 2022 10:44:37 -0500
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
Clean up IRQ resources even when a PQAP(ZAPQ) function fails with an error
not handled by a case statement.
Why?
If the ZAPQ failed, then instructions submitted to the same queue will
likewise fail. Are you saying it's not safe to assume, therefore, that
interrupts will not be occurring?
Right. We are talking about the default branch here, and I suppose, the
codes where we know that it is safe to assume that no reset is needed
handled separately (AP_RESPONSE_DECONFIGURED).
I'm not convinced that if we take the default branch we can safely
assume, that we won't see any interrupts.
For example consider hot-unplug as done by KVM. We modify the
CRYCB/APCB with all vCPUS take out of SIE, but we don't keep
the vCPUs out of SIE until the resets of the unpugged queues
are done, and we don't do any extra interrupt disablement
with all vCPUs keept out of SIE. So I believe currently there
may be a window where the guest can observe a 01 but the
interrupts are still live. That may be a bug, but IMHO it ain't clear
cut.
But it is not just about interrupts. Before we returned an error
code, which gets propagated to the userspace if this reset was
triggered via the ioctl.
With this change, ret seems to be uninitialized when returned
if we take the code path which you change here. So we would
end up logging a warning and returning garbage?
That was an oversight. The -EIO value was returned previously, so the
ret = -EIO should be set in the default case.
One could also debate, whether RCs introduced down the road
can affect the logic here (even if the statement "if we
see an RC other that 00 and 02, we don't need to pursue a
reset any further, and interrpts are disabled" were to be
guaranteed to be true now, new RCs could theoretically mess
this up).
I think that would be the case regardless of this change. If new RCs are
introduced, this function ought to be revisited anyway and appropriate
changes made.
I'm afraid this is a step in the wrong direction...
Please explain why.
Sorry, I kept this brief because IMHO it is your job to tell us why
this needs to be changed. But I gave in, as you see.
Regards,
Halil