On 2024/4/12 16:15, liulongfang wrote: > On 2024/4/5 4:07, Alex Williamson wrote: >> On Tue, 2 Apr 2024 11:24:31 +0800 >> Longfang Liu <liulongfang@xxxxxxxxxx> wrote: >> >>> On the debugfs framework of VFIO, if the CONFIG_VFIO_DEBUGFS macro is >>> enabled, the debug function is registered for the live migration driver >>> of the HiSilicon accelerator device. >>> >>> After registering the HiSilicon accelerator device on the debugfs >>> framework of live migration of vfio, a directory file "hisi_acc" >>> of debugfs is created, and then three debug function files are >>> created in this directory: >>> >>> vfio >>> | >>> +---<dev_name1> >>> | +---migration >>> | +--state >>> | +--hisi_acc >>> | +--attr >>> | +--data >>> | +--save >>> | +--cmd_state >>> | >>> +---<dev_name2> >>> +---migration >>> +--state >>> +--hisi_acc >>> +--attr >>> +--data >>> +--save >>> +--cmd_state >>> >>> data file: used to get the migration data from the driver >>> attr file: used to get device attributes parameters from the driver >>> save file: used to read the data of the live migration device and save >>> it to the driver. >>> cmd_state: used to get the cmd channel state for the device. >>> >>> +----------------+ +--------------+ +---------------+ >>> | migration dev | | src dev | | dst dev | >>> +-------+--------+ +------+-------+ +-------+-------+ >>> | | | >>> | | | >>> | | | >>> | | | >>> save | +------v-------+ +-------v-------+ >>> | | saving_mif | | resuming_migf | >>> | | file | | file | >>> | +------+-------+ +-------+-------+ >>> | | | >>> | mutex | | >>> +-------v--------+ | | >>> | | | | >>> | debug_migf file<---------------+-----------------------+ >>> | | copy >>> +-------+--------+ >>> | >>> cat | >>> | >>> +-------v--------+ >>> | user | >>> +----------------+ >>> >>> In debugfs scheme. The driver creates a separate debug_migf file. >>> It is completely separated from the two files of live migration, >>> thus preventing debugfs data from interfering with migration data. >>> Moreover, it only performs read operations on the device. >>> >>> For serialization of debugfs: >>> First, it only writes data when performing a debugfs save operation. >> >> This distinction between "writing" and "copying" is very confusing. >> > > "Writing" means reading data from the device and writing it to debug_migf. > "Copying" is to copy the data that has been saved in other migf files to > debug_migf. > The destination of both operations is the same. > The data sources are different. > >>> Second, it is only copied from the file on the migration device >>> when the live migration is complete. >> >> Why does it do this at all? If you're looking for a postmortem of the >> user generated buffer, that should be explicitly stated. >> > > debug_migf is a data buffer. Used to cache debugfs data for users > >>> These two operations are mutually exclusive through mutex. >> >> The mutual exclusion between debugfs operations is not the concern, the >> question is whether there's serialization that prevents the debugfs >> operations from interfering with the user migration flow. Nothing here >> seems to prevent concurrent use of the debugfs interface proposed here >> with a user migration. >> > > Reading data directly from the device does not cause any problems. > The device supports concurrent requests for read operations. > Therefore, there is no mutual exclusion between the debugfs save > operation and the user migration operation. > >>> >>> Signed-off-by: Longfang Liu <liulongfang@xxxxxxxxxx> >>> --- >>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 204 ++++++++++++++++++ >>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 11 + >>> 2 files changed, 215 insertions(+) >>> >>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >>> index bf358ba94b5d..9f563a31a2a1 100644 >>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >>> @@ -15,6 +15,7 @@ >>> #include <linux/anon_inodes.h> >>> >>> #include "hisi_acc_vfio_pci.h" >>> +#include "../../vfio.h" >> >> This include seems not to be required. >> > > OK, let me modify it and verify it. > >>> >>> /* Return 0 on VM acc device ready, -ETIMEDOUT hardware timeout */ >>> static int qm_wait_dev_not_ready(struct hisi_qm *qm) >>> @@ -618,6 +619,22 @@ hisi_acc_check_int_state(struct hisi_acc_vf_core_device *hisi_acc_vdev) >>> } >>> } >>> >>> +static void hisi_acc_vf_migf_save(struct hisi_acc_vf_core_device *hisi_acc_vdev, >>> + struct hisi_acc_vf_migration_file *src_migf) >> >> Seems this should be named something relative to debug since it's only >> purpose is to copy the migration file to the debug migration file. > > How about hisi_acc_debug_migf_copy? > >> >>> +{ >>> + struct hisi_acc_vf_migration_file *dst_migf = hisi_acc_vdev->debug_migf; >>> + >>> + if (!dst_migf) >>> + return; >>> + >>> + mutex_lock(&hisi_acc_vdev->enable_mutex); >>> + dst_migf->disabled = src_migf->disabled; >> >> In the cases where this is called, the caller is about to call >> hisi_acc_vf_disable_fd() which sets disabled = true and then >> hisi_acc_vf_debug_save() doesn't touch the value! What does it even >> mean to copy this value, let alone print it as part of the debugfs >> output later? >> > > Yes, the disable assignment of debug_migf needs to be processed > in hisi_acc_vf_debug_save. > >> >>> + dst_migf->total_length = src_migf->total_length; >>> + memcpy(&dst_migf->vf_data, &src_migf->vf_data, >>> + sizeof(struct acc_vf_data)); >>> + mutex_unlock(&hisi_acc_vdev->enable_mutex); >>> +} >>> + >>> static void hisi_acc_vf_disable_fd(struct hisi_acc_vf_migration_file *migf) >>> { >>> mutex_lock(&migf->lock); >>> @@ -630,12 +647,14 @@ static void hisi_acc_vf_disable_fd(struct hisi_acc_vf_migration_file *migf) >>> static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vdev) >>> { >>> if (hisi_acc_vdev->resuming_migf) { >>> + hisi_acc_vf_migf_save(hisi_acc_vdev, hisi_acc_vdev->resuming_migf); >>> hisi_acc_vf_disable_fd(hisi_acc_vdev->resuming_migf); >>> fput(hisi_acc_vdev->resuming_migf->filp); >>> hisi_acc_vdev->resuming_migf = NULL; >>> } >>> >>> if (hisi_acc_vdev->saving_migf) { >>> + hisi_acc_vf_migf_save(hisi_acc_vdev, hisi_acc_vdev->saving_migf); >> >> Why are these buffers copied to the debug_migf in this case? This can >> happen asynchronous to accessing the debugfs migration file and there's >> no serialization. >> > > We can try to copy when accessing debugfs. > I tried migrating the file to save data when accessing debugfs. This solution will not work. resuming_migf and saving_migf will be kfree after completing the migration, and their data will no longer be saved. Therefore, the data needs to be saved to debug_migf when calling "hisi_acc_vf_disable_fd". In addition, reading the migration status data of device can be read directly during debugfs access, without using debug_migf. Thanks, Longfang. >>> hisi_acc_vf_disable_fd(hisi_acc_vdev->saving_migf); >>> fput(hisi_acc_vdev->saving_migf->filp); >>> hisi_acc_vdev->saving_migf = NULL; >>> @@ -1144,6 +1163,7 @@ static int hisi_acc_vf_qm_init(struct hisi_acc_vf_core_device *hisi_acc_vdev) >>> if (!vf_qm->io_base) >>> return -EIO; >>> >>> + mutex_init(&hisi_acc_vdev->enable_mutex); >>> vf_qm->fun_type = QM_HW_VF; >>> vf_qm->pdev = vf_dev; >>> mutex_init(&vf_qm->mailbox_lock); >>> @@ -1294,6 +1314,181 @@ static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int >>> return vfio_pci_core_ioctl(core_vdev, cmd, arg); >>> } >>> >>> +static int hisi_acc_vf_debug_check(struct seq_file *seq, struct vfio_device *vdev) >>> +{ >>> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev); >>> + struct hisi_acc_vf_migration_file *migf = hisi_acc_vdev->debug_migf; >>> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; >>> + int ret; >>> + >> >> lockdep_assert_held(...) >> >>> + if (!vdev->mig_ops || !migf) { >>> + seq_printf(seq, "%s\n", "device does not support live migration!"); >>> + return -EINVAL; >> >> Isn't the -EINVAL sufficient? >> > > Which one do you think is better? > >>> + } >>> + >>> + /** >>> + * When the device is not opened, the io_base is not mapped. >>> + * The driver cannot perform device read and write operations. >>> + */ >>> + if (hisi_acc_vdev->dev_opened != DEV_OPEN) { >> >> Why are we assigning and testing a bool against and enum?! >> > > OK, change to true and false assignment. > >>> + seq_printf(seq, "%s\n", "device not opened!"); >>> + return -EINVAL; >>> + } >>> + >>> + ret = qm_wait_dev_not_ready(vf_qm); >>> + if (ret) { >>> + seq_printf(seq, "%s\n", "VF device not ready!"); >>> + return -EINVAL; >> >> -EBUSY? Again, not sure why we need the seq_printf() in addition to >> the error value. >> > > OK, -EBUSY is better. > seq_printf() allows users to directly obtain the cause of the > error without checking dmesg. > > >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static int hisi_acc_vf_debug_cmd(struct seq_file *seq, void *data) >>> +{ >>> + struct device *vf_dev = seq->private; >>> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev); >>> + struct vfio_device *vdev = &core_device->vdev; >>> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev); >>> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; >>> + u64 value; >>> + int ret; >>> + >>> + mutex_lock(&hisi_acc_vdev->enable_mutex); >>> + ret = hisi_acc_vf_debug_check(seq, vdev); >>> + if (ret) { >>> + mutex_unlock(&hisi_acc_vdev->enable_mutex); >>> + return 0; >> >> Why do we squash the error return here and throughout? >> > seq_printf() gives the user a failure message. > This can be changed to "return ret;" > >>> + } >>> + >>> + value = readl(vf_qm->io_base + QM_MB_CMD_SEND_BASE); >>> + mutex_unlock(&hisi_acc_vdev->enable_mutex); >>> + seq_printf(seq, "%s:0x%llx\n", "mailbox cmd channel state is OK", value); >> >> We didn't test the value, what makes the state OK? Can this readl() or >> those in qm_wait_dev_not_ready() interfere with the main device flow? >> > > If the cmd channel is normal, it will return a non-all-F value. > Add exception checking in the next version. > > cmd channel read operation will not affect the main migration process. > >>> + >>> + return 0; >>> +} >>> + >>> +static int hisi_acc_vf_debug_save(struct seq_file *seq, void *data) >>> +{ >>> + struct device *vf_dev = seq->private; >>> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev); >>> + struct vfio_device *vdev = &core_device->vdev; >>> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev); >>> + struct hisi_acc_vf_migration_file *migf = hisi_acc_vdev->debug_migf; >>> + struct acc_vf_data *vf_data = &migf->vf_data; >>> + int ret; >>> + >>> + mutex_lock(&hisi_acc_vdev->enable_mutex); >>> + ret = hisi_acc_vf_debug_check(seq, vdev); >>> + if (ret) { >>> + mutex_unlock(&hisi_acc_vdev->enable_mutex); >>> + return 0; >>> + } >>> + >>> + vf_data->vf_qm_state = QM_READY; >>> + ret = vf_qm_read_data(&hisi_acc_vdev->vf_qm, vf_data); >>> + if (ret) { >>> + mutex_unlock(&hisi_acc_vdev->enable_mutex); >>> + seq_printf(seq, "%s\n", "failed to save device data!"); >>> + return 0; >>> + } >>> + >>> + migf->total_length = sizeof(struct acc_vf_data); >>> + mutex_unlock(&hisi_acc_vdev->enable_mutex); >>> + seq_printf(seq, "%s\n", "successful to save device data!"); >>> + >>> + return 0; >>> +} >>> + >>> +static int hisi_acc_vf_data_read(struct seq_file *seq, void *data) >>> +{ >>> + struct device *vf_dev = seq->private; >>> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev); >>> + struct vfio_device *vdev = &core_device->vdev; >>> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev); >>> + struct hisi_acc_vf_migration_file *debug_migf = hisi_acc_vdev->debug_migf; >>> + size_t vf_data_sz = offsetofend(struct acc_vf_data, padding); >>> + >>> + if (debug_migf && debug_migf->total_length) >>> + seq_hex_dump(seq, "Mig Data:", DUMP_PREFIX_OFFSET, 16, 1, >>> + (unsigned char *)&debug_migf->vf_data, >>> + vf_data_sz, false); >>> + else >>> + seq_printf(seq, "%s\n", "device not migrated!"); >> >> "device state not saved"? Although I don't recall why this doesn't >> just return an errno. >> > > OK, those who exit directly without migration will be processed according > to the error mode and an error code will be returned. > >>> + >>> + return 0; >>> +} >>> + >>> +static int hisi_acc_vf_attr_read(struct seq_file *seq, void *data) >>> +{ >>> + struct device *vf_dev = seq->private; >>> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev); >>> + struct vfio_device *vdev = &core_device->vdev; >>> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev); >>> + struct hisi_acc_vf_migration_file *debug_migf = hisi_acc_vdev->debug_migf; >>> + >>> + if (debug_migf && debug_migf->total_length) { >>> + seq_printf(seq, >>> + "acc device:\n" >>> + "device state: %d\n" >>> + "device ready: %u\n" >>> + "data valid: %d\n" >>> + "data size: %lu\n", >>> + hisi_acc_vdev->mig_state, >> >> This is redundant to migration/state, however note > > OK! > >> hisi_acc_vfio_pci_get_device_state() protects the value with state >> mutex while reading it. >> >>> + hisi_acc_vdev->vf_qm_state, >> >> What's the purpose of this, it's ready or not, what does that mean? >> > > If this status is not ready, the live migration process will exit directly. > It indicates that there are two possible exceptions: > 1. The acc device driver in the VM does not have insmod. > 2. The acc device driver in the VM is insmoded, but the cmd channel is abnormal. > >>> + debug_migf->disabled, >> >> What's the purpose of this? > > Get the enable status of migf file in the driver. > >> >>> + debug_migf->total_length); >> >> Why not just have this printed or inferred via the above data_read >> function, this all seems unnecessary. >> > > This is used to obtain some key status of the live migration driver. > It is more important than data in problem location. > So it is output in key-value pairs. > > The above data is directly output in the form of hexadecimal data. > It is used for more detailed analysis when there are no abnormalities > in the key status. > >>> + } else { >>> + seq_printf(seq, "%s\n", "device not migrated!"); >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static int hisi_acc_vfio_debug_init(struct hisi_acc_vf_core_device *hisi_acc_vdev) >>> +{ >>> + struct vfio_device *vdev = &hisi_acc_vdev->core_device.vdev; >>> + struct dentry *vfio_dev_migration = NULL; >>> + struct dentry *vfio_hisi_acc = NULL; >>> + struct device *dev = vdev->dev; >>> + void *migf = NULL; >>> + >>> + if (!debugfs_initialized()) >>> + return 0; >>> + >>> + migf = kzalloc(sizeof(struct hisi_acc_vf_migration_file), GFP_KERNEL); >>> + if (!migf) >>> + return -ENOMEM; >>> + hisi_acc_vdev->debug_migf = migf; >>> + >>> + vfio_dev_migration = debugfs_lookup("migration", vdev->debug_root); >> >> Fails to build without CONFIG_DEBUG_FS=y There should be a dependency >> on CONFIG_VFIO_DEBUGFS here somewhere. >> > > Yes, the driver needs to add "!IS_ENABLED(CONFIG_VFIO_DEBUGFS)" > behind "debugfs_initialized()" above. > >>> + if (!vfio_dev_migration) { >>> + kfree(migf); >> >> hisi_acc_vdev->debug_migf still points at this freed buffer, the return >> value of this function is not tested, allows a use-after-free in >> all of the below debugfs interfaces. >> > > Yes, there needs to add "hisi_acc_vdev->debug_migf = NULL" > >>> + dev_err(dev, "failed to lookup migration debugfs file!\n"); >>> + return -ENODEV; >>> + } >>> + >>> + vfio_hisi_acc = debugfs_create_dir("hisi_acc", vfio_dev_migration); >>> + debugfs_create_devm_seqfile(dev, "data", vfio_hisi_acc, >>> + hisi_acc_vf_data_read); >>> + debugfs_create_devm_seqfile(dev, "attr", vfio_hisi_acc, >>> + hisi_acc_vf_attr_read); >>> + debugfs_create_devm_seqfile(dev, "cmd_state", vfio_hisi_acc, >>> + hisi_acc_vf_debug_cmd); >>> + debugfs_create_devm_seqfile(dev, "save", vfio_hisi_acc, >>> + hisi_acc_vf_debug_save); >>> + >>> + return 0; >>> +} >>> + >>> +static void hisi_acc_vf_debugfs_exit(struct hisi_acc_vf_core_device *hisi_acc_vdev) >>> +{ >>> + if (!debugfs_initialized()) >>> + return; >>> + >>> + kfree(hisi_acc_vdev->debug_migf); >> >> Double free if the lookup in init fails. >> > > After adding "hisi_acc_vdev->debug_migf = NULL" above. > These processing codes need to be added here: > > if (hisi_acc_vdev->debug_migf) > kfree(hisi_acc_vdev->debug_migf); > >>> +} >>> + >>> static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev) >>> { >>> struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev); >>> @@ -1311,9 +1506,11 @@ static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev) >>> return ret; >>> } >>> hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING; >>> + hisi_acc_vdev->dev_opened = DEV_OPEN; >> >> = true! >> > > OK, the next version will not use enumeration values. will use true/false assignment. > >>> } >>> >>> vfio_pci_core_finish_enable(vdev); >>> + >>> return 0; >>> } >>> >>> @@ -1322,7 +1519,10 @@ static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev) >>> struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev); >>> struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; >>> >>> + hisi_acc_vdev->dev_opened = DEV_CLOSE; >>> + mutex_lock(&hisi_acc_vdev->enable_mutex); >>> iounmap(vf_qm->io_base); >>> + mutex_unlock(&hisi_acc_vdev->enable_mutex); >>> vfio_pci_core_close_device(core_vdev); >>> } >>> >>> @@ -1413,6 +1613,9 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device >>> ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device); >>> if (ret) >>> goto out_put_vdev; >>> + >>> + if (ops == &hisi_acc_vfio_pci_migrn_ops) >>> + hisi_acc_vfio_debug_init(hisi_acc_vdev); >>> return 0; >>> >>> out_put_vdev: >>> @@ -1425,6 +1628,7 @@ static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev) >>> struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev); >>> >>> vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device); >>> + hisi_acc_vf_debugfs_exit(hisi_acc_vdev); >>> vfio_put_device(&hisi_acc_vdev->core_device.vdev); >>> } >>> >>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h >>> index 7a9dc87627cd..3a20d81d105c 100644 >>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h >>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h >>> @@ -52,6 +52,11 @@ >>> #define QM_EQC_DW0 0X8000 >>> #define QM_AEQC_DW0 0X8020 >>> >>> +enum acc_dev_state { >>> + DEV_CLOSE = 0x0, >>> + DEV_OPEN, >>> +}; >>> + >>> struct acc_vf_data { >>> #define QM_MATCH_SIZE offsetofend(struct acc_vf_data, qm_rsv_state) >>> /* QM match information */ >>> @@ -114,5 +119,11 @@ struct hisi_acc_vf_core_device { >>> int vf_id; >>> struct hisi_acc_vf_migration_file *resuming_migf; >>> struct hisi_acc_vf_migration_file *saving_migf; >>> + >>> + /* To make sure the device is enabled */ >>> + struct mutex enable_mutex; >>> + bool dev_opened; >>> + /* For debugfs */ >>> + struct hisi_acc_vf_migration_file *debug_migf; >>> }; >>> #endif /* HISI_ACC_VFIO_PCI_H */ >> >> . >> > Thank you very much for your careful inspection. > I will revise the inspection issues you mentioned above > one by one and publish them in the next version. > > Thanks again! > Longfang. > > . >