On Mon, 24 Sep 2018 14:16:42 +0200 Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > On 09/24/2018 01:36 PM, Cornelia Huck wrote: > > On Wed, 12 Sep 2018 15:43:03 -0400 > > Tony Krowiak <akrowiak@xxxxxxxxxxxxxxxxxx> wrote: > >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > >> index f8b276a..48b1b78 100644 > >> --- a/drivers/s390/crypto/vfio_ap_ops.c > >> +++ b/drivers/s390/crypto/vfio_ap_ops.c > >> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, > >> return NOTIFY_OK; > >> } > >> > >> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, > >> + unsigned int retry) > >> +{ > >> + struct ap_queue_status status; > >> + > >> + do { > >> + status = ap_zapq(AP_MKQID(apid, apqi)); > >> + switch (status.response_code) { > >> + case AP_RESPONSE_NORMAL: > >> + return 0; > >> + case AP_RESPONSE_RESET_IN_PROGRESS: > >> + case AP_RESPONSE_BUSY: > >> + msleep(20); > >> + break; > >> + default: > >> + /* things are really broken, give up */ > >> + return -EIO; > >> + } > >> + } while (retry--); > >> + > >> + return -EBUSY; > > > > So, this function may either return 0, -EIO (things are really broken), > > or -EBUSY (still busy after multiple tries)... > > > >> +} > >> + > >> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) > >> +{ > >> + int ret; > >> + int rc = 0; > >> + unsigned long apid, apqi; > >> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); > >> + > >> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, > >> + matrix_mdev->matrix.apm_max + 1) { > >> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, > >> + matrix_mdev->matrix.aqm_max + 1) { > >> + ret = vfio_ap_mdev_reset_queue(apid, apqi, 1); > >> + if (ret) > >> + rc = ret; > > > > ...and here, we return the last error of any of the resets. Two > > questions: > > > > - Does it make sense to continue if we get -EIO? IOW, does "really > > broken" only refer to a certain tuple and other tuples still can/need > > to be reset? > > I think it does make sense to continue, because IMHO "things are really > broken" is an overstatement (I mean the APQN invalid case). One could > argue would skipping the current card (adapter) be justified or not. A short comment ("even after -EIO, other devices still need to be reset") may be helpful here (remember that I don't have any way to verify this with the architecture). > > IMHO the current code is good enough for the first shot, and we can think > about fine-tuning it later. Sure. > > > - Is the return code useful in any way, as we don't know which tuple it > > refers to? > > > > Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY > is mostly fine given what the architecture say if we are satisfied with just > reset. And the cases behind -EIO might actually be OK too in the same sense. > My guess is, that based on the return value client code can tell if we have > zeroize for all queues or basically just reset (like rapq). We could log that > to some debug facility or whatever -- I guess, but at the moment we don't care. Logging would probably be more useful than the return code, but that can be added later. > > In the end I think the code is good enough as is, and if we want we can > improve on it later. I don't object to that; but this is all a bit confusing to readers without access to the architecture, so I think a comment or two would really improve things. > > Regards, > Halil > > > >> + } > >> + } > >> + > >> + return rc; > >> +} > >> + > >> static int vfio_ap_mdev_open(struct mdev_device *mdev) > >> { > >> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); > >> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev) > >> if (matrix_mdev->kvm) > >> kvm_arch_crypto_clear_masks(matrix_mdev->kvm); > >> > >> + vfio_ap_mdev_reset_queues(mdev); > >> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, > >> &matrix_mdev->group_notifier); > >> matrix_mdev->kvm = NULL; > > >