On Tue, 11 Jun 2019 10:37:55 -0400 Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote: > On 6/7/19 10:29 AM, Halil Pasic wrote: > > On Tue, 4 Jun 2019 15:38:51 -0400 > > Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote: [..] > >>> +static void vfio_ap_wait_for_irqclear(int apqn) > >>> +{ > >>> + struct ap_queue_status status; > >>> + int retry = 5; > >>> + > >>> + do { > >>> + status = ap_tapq(apqn, NULL); > >>> + switch (status.response_code) { > >>> + case AP_RESPONSE_NORMAL: > >>> + case AP_RESPONSE_RESET_IN_PROGRESS: > >>> + if (!status.irq_enabled) > >>> + return; > >>> + /* Fall through */ > >>> + case AP_RESPONSE_BUSY: > >>> + msleep(20); > >>> + break; > >>> + case AP_RESPONSE_Q_NOT_AVAIL: > >>> + case AP_RESPONSE_DECONFIGURED: > >>> + case AP_RESPONSE_CHECKSTOPPED: > >>> + default: > >>> + WARN_ONCE(1, "%s: tapq rc %02x: %04x\n", __func__, > >>> + status.response_code, apqn); > >>> + return; > >> > >> Why not just break out of the loop and just use the WARN_ONCE > >> outside of the loop? > >> > > > > AFAIU the idea was to differentiate between got a strange response_code > > and ran out of retires. > > In both cases, the response code is placed into the message, so one > should be able to discern the reason in either case. This is not > critical, just an observation. > I understand, but the message below does say 'could not clear' while the message above does not. One could infer that information, but I could not do it without digging. So I think keeping these separate does have a certain merit to it. Let's keep it for now. We can change this later if we want. > > > > Actually I suspect that we are fine in case of AP_RESPONSE_Q_NOT_AVAIL, > > AP_RESPONSE_DECONFIGURED and AP_RESPONSE_CHECKSTOPPED in a sense that > > what should be the post-condition of this function is guaranteed to be > > reached. What do you think? > > That would seem to be the case given those response codes indicate the > queue is not accessible. > > > > > While I think that we can do better here, I see this as something that > > should be done on top. > > Are you talking about a patch on top? What do you think needs to be > addressed? > For starters, I'm not sure if the first warning is necessary or even appropriate. See the paragraph starting with 'Actually I suspect that we are fine in case ...'. > > > >>> + } > >>> + } while (--retry); > >>> + > >>> + WARN_ONCE(1, "%s: tapq rc %02x: %04x could not clear IR bit\n", > >>> + __func__, status.response_code, apqn); > >>> +} > >>> + > >>> +/** > >>> + * vfio_ap_free_aqic_resources > >>> + * @q: The vfio_ap_queue > >>> + * > >>> + * Unregisters the ISC in the GIB when the saved ISC not invalid. > >>> + * Unpin the guest's page holding the NIB when it exist. > >>> + * Reset the saved_pfn and saved_isc to invalid values. > >>> + * Clear the pointer to the matrix mediated device. > >>> + * > >>> + */ > > > > [..] > > > >>> +struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q) > >>> +{ > >>> + struct ap_qirq_ctrl aqic_gisa = {}; > >>> + struct ap_queue_status status; > >>> + int retries = 5; > >>> + > >>> + do { > >>> + status = ap_aqic(q->apqn, aqic_gisa, NULL); > >>> + switch (status.response_code) { > >>> + case AP_RESPONSE_OTHERWISE_CHANGED: > >>> + case AP_RESPONSE_NORMAL: > >>> + vfio_ap_wait_for_irqclear(q->apqn); > >>> + goto end_free; > >>> + case AP_RESPONSE_RESET_IN_PROGRESS: > >>> + case AP_RESPONSE_BUSY: > >>> + msleep(20); > >>> + break; > >>> + case AP_RESPONSE_Q_NOT_AVAIL: > >>> + case AP_RESPONSE_DECONFIGURED: > >>> + case AP_RESPONSE_CHECKSTOPPED: > >>> + case AP_RESPONSE_INVALID_ADDRESS: > >>> + default: > >>> + /* All cases in default means AP not operational */ > >>> + WARN_ONCE(1, "%s: ap_aqic status %d\n", __func__, > >>> + status.response_code); > >>> + goto end_free; > >> > >> Why not just break out of the loop instead of repeating the WARN_ONCE > >> message? > >> > > > > I suppose the reason is same as above. I'm not entirely happy with this > > code myself. E.g. why do we do retries here -- shouldn't we just fail the > > aqic by the guest? > > According to my reading of the code, it looks like the retries are for > response code AP_RESPONSE_BUSY. Why wouldn't we want to wait until the > queue was not busy anymore? > Does HW/FW wait or does it present AP_RESPONSE_BUSY? (Rhetoric question.) It is for the guest to decide if and how does it wish to wait or otherwise react to AP_RESPONSE_BUSY. Or am I missing something? > > > > [..] > > > >>> +static int handle_pqap(struct kvm_vcpu *vcpu) > >>> +{ > >>> + uint64_t status; > >>> + uint16_t apqn; > >>> + struct vfio_ap_queue *q; > >>> + struct ap_queue_status qstatus = { > >>> + .response_code = AP_RESPONSE_Q_NOT_AVAIL, }; > >>> + struct ap_matrix_mdev *matrix_mdev; > >>> + > >>> + /* If we do not use the AIV facility just go to userland */ > >>> + if (!(vcpu->arch.sie_block->eca & ECA_AIV)) > >>> + return -EOPNOTSUPP; > >>> + > >>> + apqn = vcpu->run->s.regs.gprs[0] & 0xffff; > >>> + mutex_lock(&matrix_dev->lock); > >>> + > >>> + if (!vcpu->kvm->arch.crypto.pqap_hook) > >> > >> Wasn't this already checked in patch 2 prior to calling this > >> function? In fact, doesn't the hook point to this function? > >> > > > > Let us benevolently call this defensive programming. We are actually > > in that callback AFAICT, so it sure was set a moment ago, and I guess > > the client code still holds the kvm.lock so it is guaranteed to stay > > so unless somebody is playing foul. > > Defensive, but completely unnecessary; however, it doesn't negatively > affect the logic in the least. > I agree it is unnecessary. We can get rid of it later. I'm not too keen of altering somebody's patch without a really strong reason. > > > > We can address this with a patch on top. > > [..] Regards, Halil