On 2025/2/26 16:11, Shameerali Kolothum Thodi wrote: > > >> -----Original Message----- >> From: Alex Williamson <alex.williamson@xxxxxxxxxx> >> Sent: Wednesday, February 26, 2025 12:10 AM >> To: liulongfang <liulongfang@xxxxxxxxxx> >> Cc: jgg@xxxxxxxxxx; Shameerali Kolothum Thodi >> <shameerali.kolothum.thodi@xxxxxxxxxx>; Jonathan Cameron >> <jonathan.cameron@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; linux- >> kernel@xxxxxxxxxxxxxxx; linuxarm@xxxxxxxxxxxxx >> Subject: Re: [PATCH v4 1/5] hisi_acc_vfio_pci: fix XQE dma address error >> >> On Tue, 25 Feb 2025 14:27:53 +0800 >> Longfang Liu <liulongfang@xxxxxxxxxx> wrote: >> >>> The dma addresses of EQE and AEQE are wrong after migration and >>> results in guest kernel-mode encryption services failure. >>> Comparing the definition of hardware registers, we found that >>> there was an error when the data read from the register was >>> combined into an address. Therefore, the address combination >>> sequence needs to be corrected. >>> >>> Even after fixing the above problem, we still have an issue >>> where the Guest from an old kernel can get migrated to >>> new kernel and may result in wrong data. >>> >>> In order to ensure that the address is correct after migration, >>> if an old magic number is detected, the dma address needs to be >>> updated. >>> >>> Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live >> migration") >>> Signed-off-by: Longfang Liu <liulongfang@xxxxxxxxxx> >>> --- >>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 40 ++++++++++++++++--- >>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 14 ++++++- >>> 2 files changed, 46 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >>> index 451c639299eb..35316984089b 100644 >>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >>> @@ -350,6 +350,31 @@ static int vf_qm_func_stop(struct hisi_qm *qm) >>> return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0); >>> } >>> >>> +static int vf_qm_version_check(struct acc_vf_data *vf_data, struct device >> *dev) >>> +{ >>> + switch (vf_data->acc_magic) { >>> + case ACC_DEV_MAGIC_V2: >>> + if (vf_data->major_ver < ACC_DRV_MAJOR_VER || >>> + vf_data->minor_ver < ACC_DRV_MINOR_VER) >>> + dev_info(dev, "migration driver version not >> match!\n"); >>> + return -EINVAL; >>> + break; >> >> What's your major/minor update strategy? >> >> Note that minor_ver is a u16 and ACC_DRV_MINOR_VER is defined as 0, so >> testing less than 0 against an unsigned is useless. >> >> Arguably testing major and minor independently is pretty useless too. >> >> You're defining what a future "old" driver version will accept, which >> is very nearly anything, so any breaking change *again* requires a new >> magic, so we're accomplishing very little here. >> >> Maybe you want to consider a strategy where you'd increment the major >> number for a breaking change and minor for a compatible feature. In >> that case you'd want to verify the major_ver matches >> ACC_DRV_MAJOR_VER >> exactly and minor_ver would be more of a feature level. > > Agree. I think the above check should be just major_ver != ACC_DRV_MAJOR_VER > and we can make use of minor version whenever we need a specific handling for > a feature support. > Well, that's a good way. We only use minor_ver to record important updates of the driver. When there is an important update, minor_ver increases by 1. Major_ver is used to distinguish migration versions. After major_ver is updated, minor_ver returns to 0. Therefore, we can change it to only check major_ver. > Also I think it would be good to print the version info above in case of mismatch. > OK. Thanks. Longfang. >> >>> + case ACC_DEV_MAGIC_V1: >>> + /* Correct dma address */ >>> + vf_data->eqe_dma = vf_data- >>> qm_eqc_dw[QM_XQC_ADDR_HIGH]; >>> + vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET; >>> + vf_data->eqe_dma |= vf_data- >>> qm_eqc_dw[QM_XQC_ADDR_LOW]; >>> + vf_data->aeqe_dma = vf_data- >>> qm_aeqc_dw[QM_XQC_ADDR_HIGH]; >>> + vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET; >>> + vf_data->aeqe_dma |= vf_data- >>> qm_aeqc_dw[QM_XQC_ADDR_LOW]; >>> + break; >>> + default: >>> + return -EINVAL; >>> + } >>> + >>> + return 0; >>> +} >>> + >>> static int vf_qm_check_match(struct hisi_acc_vf_core_device >> *hisi_acc_vdev, >>> struct hisi_acc_vf_migration_file *migf) >>> { >>> @@ -363,7 +388,8 @@ static int vf_qm_check_match(struct >> hisi_acc_vf_core_device *hisi_acc_vdev, >>> if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev- >>> match_done) >>> return 0; >>> >>> - if (vf_data->acc_magic != ACC_DEV_MAGIC) { >>> + ret = vf_qm_version_check(vf_data, dev); >>> + if (ret) { >>> dev_err(dev, "failed to match ACC_DEV_MAGIC\n"); >>> return -EINVAL; >>> } >>> @@ -418,7 +444,9 @@ static int vf_qm_get_match_data(struct >> hisi_acc_vf_core_device *hisi_acc_vdev, >>> int vf_id = hisi_acc_vdev->vf_id; >>> int ret; >>> >>> - vf_data->acc_magic = ACC_DEV_MAGIC; >>> + vf_data->acc_magic = ACC_DEV_MAGIC_V2; >>> + vf_data->major_ver = ACC_DRV_MAR; >>> + vf_data->minor_ver = ACC_DRV_MIN; >> >> Where are "MAR" and "MIN" defined? I can't see how this would even >> compile. Thanks, > > Yes. Please make sure to do a compile test and run basic sanity tested even if you > think the changes are minor. Chances are there that you will miss out things like > this. > > Thanks, > Shameer > > . >