> -----Original Message----- > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Wednesday, February 26, 2025 12:10 AM > To: liulongfang <liulongfang@xxxxxxxxxx> > Cc: jgg@xxxxxxxxxx; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@xxxxxxxxxx>; Jonathan Cameron > <jonathan.cameron@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; linuxarm@xxxxxxxxxxxxx > Subject: Re: [PATCH v4 1/5] hisi_acc_vfio_pci: fix XQE dma address error > > On Tue, 25 Feb 2025 14:27:53 +0800 > Longfang Liu <liulongfang@xxxxxxxxxx> wrote: > > > The dma addresses of EQE and AEQE are wrong after migration and > > results in guest kernel-mode encryption services failure. > > Comparing the definition of hardware registers, we found that > > there was an error when the data read from the register was > > combined into an address. Therefore, the address combination > > sequence needs to be corrected. > > > > Even after fixing the above problem, we still have an issue > > where the Guest from an old kernel can get migrated to > > new kernel and may result in wrong data. > > > > In order to ensure that the address is correct after migration, > > if an old magic number is detected, the dma address needs to be > > updated. > > > > Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live > migration") > > Signed-off-by: Longfang Liu <liulongfang@xxxxxxxxxx> > > --- > > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 40 ++++++++++++++++--- > > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 14 ++++++- > > 2 files changed, 46 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > > index 451c639299eb..35316984089b 100644 > > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c > > @@ -350,6 +350,31 @@ static int vf_qm_func_stop(struct hisi_qm *qm) > > return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0); > > } > > > > +static int vf_qm_version_check(struct acc_vf_data *vf_data, struct device > *dev) > > +{ > > + switch (vf_data->acc_magic) { > > + case ACC_DEV_MAGIC_V2: > > + if (vf_data->major_ver < ACC_DRV_MAJOR_VER || > > + vf_data->minor_ver < ACC_DRV_MINOR_VER) > > + dev_info(dev, "migration driver version not > match!\n"); > > + return -EINVAL; > > + break; > > What's your major/minor update strategy? > > Note that minor_ver is a u16 and ACC_DRV_MINOR_VER is defined as 0, so > testing less than 0 against an unsigned is useless. > > Arguably testing major and minor independently is pretty useless too. > > You're defining what a future "old" driver version will accept, which > is very nearly anything, so any breaking change *again* requires a new > magic, so we're accomplishing very little here. > > Maybe you want to consider a strategy where you'd increment the major > number for a breaking change and minor for a compatible feature. In > that case you'd want to verify the major_ver matches > ACC_DRV_MAJOR_VER > exactly and minor_ver would be more of a feature level. Agree. I think the above check should be just major_ver != ACC_DRV_MAJOR_VER and we can make use of minor version whenever we need a specific handling for a feature support. Also I think it would be good to print the version info above in case of mismatch. > > > + case ACC_DEV_MAGIC_V1: > > + /* Correct dma address */ > > + vf_data->eqe_dma = vf_data- > >qm_eqc_dw[QM_XQC_ADDR_HIGH]; > > + vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET; > > + vf_data->eqe_dma |= vf_data- > >qm_eqc_dw[QM_XQC_ADDR_LOW]; > > + vf_data->aeqe_dma = vf_data- > >qm_aeqc_dw[QM_XQC_ADDR_HIGH]; > > + vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET; > > + vf_data->aeqe_dma |= vf_data- > >qm_aeqc_dw[QM_XQC_ADDR_LOW]; > > + break; > > + default: > > + return -EINVAL; > > + } > > + > > + return 0; > > +} > > + > > static int vf_qm_check_match(struct hisi_acc_vf_core_device > *hisi_acc_vdev, > > struct hisi_acc_vf_migration_file *migf) > > { > > @@ -363,7 +388,8 @@ static int vf_qm_check_match(struct > hisi_acc_vf_core_device *hisi_acc_vdev, > > if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev- > >match_done) > > return 0; > > > > - if (vf_data->acc_magic != ACC_DEV_MAGIC) { > > + ret = vf_qm_version_check(vf_data, dev); > > + if (ret) { > > dev_err(dev, "failed to match ACC_DEV_MAGIC\n"); > > return -EINVAL; > > } > > @@ -418,7 +444,9 @@ static int vf_qm_get_match_data(struct > hisi_acc_vf_core_device *hisi_acc_vdev, > > int vf_id = hisi_acc_vdev->vf_id; > > int ret; > > > > - vf_data->acc_magic = ACC_DEV_MAGIC; > > + vf_data->acc_magic = ACC_DEV_MAGIC_V2; > > + vf_data->major_ver = ACC_DRV_MAR; > > + vf_data->minor_ver = ACC_DRV_MIN; > > Where are "MAR" and "MIN" defined? I can't see how this would even > compile. Thanks, Yes. Please make sure to do a compile test and run basic sanity tested even if you think the changes are minor. Chances are there that you will miss out things like this. Thanks, Shameer