On 4/14/2023 5:43 AM, Jason Gunthorpe wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Tue, Apr 04, 2023 at 12:01:37PM -0700, Brett Creeley wrote:
@@ -30,13 +34,23 @@ pds_vfio_pci_probe(struct pci_dev *pdev,
dev_set_drvdata(&pdev->dev, &pds_vfio->vfio_coredev);
pds_vfio->pdev = pdev;
+ pds_vfio->pdsc = pdsc_get_pf_struct(pdev);
This should not be a void *, it has a type, looks like it is 'struct
pdsc *' - comment applies to all the places in both series that
dropped the type here.
Will fix.
Jason
Hey Jason,
Thanks for the responses/feedback.
For some reason Shannon and I didn't get any of your recent responses in
our inboxes except this one. We're not really sure why... Due to this,
I'm replying to all of your responses in this thread.
>> + union pds_core_adminq_cmd cmd = { 0 };
> These should all be = {}, adding the 0 is a subtly different thing in
Will fix.
>> +int
>> +pds_vfio_suspend_device_cmd(struct pds_vfio_pci_device *pds_vfio)
>> +{
>> + struct pds_lm_suspend_cmd cmd = {
>> + .opcode = PDS_LM_CMD_SUSPEND,
>> + .vf_id = cpu_to_le16(pds_vfio->vf_id),
>> + };
>> + struct pds_lm_suspend_comp comp = {0};
>> + struct pci_dev *pdev = pds_vfio->pdev;
>> + int err;
>> +
>> + dev_dbg(&pdev->dev, "vf%u: Suspend device\n", pds_vfio->vf_id);
>> +
>> + err = pds_client_adminq_cmd(pds_vfio,
>> + (union pds_core_adminq_cmd *)&cmd,
>> + sizeof(cmd),
>> + (union pds_core_adminq_comp *)&comp,
>> + PDS_AQ_FLAG_FASTPOLL);
> These casts to a union are really weird, why isn't the union the type
> on the stack?
Yeah, this is an artifact of initial development that allowed us to
completely de-couple pds_lm.h from pds_adminq.h, but there don't seem to
be any conflicts including inluding pds_lm.h in pds_adminq.h. So, it
will be fixed in the next revision.
>> +
>> + /* alloc sgl */
>> + sgl = dma_alloc_coherent(dev, lm_file->num_sge *
>> + sizeof(struct pds_lm_sg_elem),
>> + &lm_file->sgl_addr, GFP_KERNEL);
> Do you really need a coherent allocation for this?
I don't think it is needed. I will look into this and fix it if
dma_alloc_coherent() isn't needed.
>> +#define PDS_VFIO_LM_FILENAME "pds_vfio_lm"
> This doesn't need a define, it is typical to write the pseudo filename
> in the only anon_inode_getfile()
Yeah, we technically only use it in one spot, so it makes sense to just
have the string inlined to the anon_inode_getfile() call. This also
allows us to get rid of the const char *name argument in
pds_vfio_get_lm_file(). Will fix in the next revision.
>> +static struct pds_vfio_lm_file *
>> +pds_vfio_get_lm_file(const char *name, const struct file_operations
*fops,
>> + int flags, u64 size)
>> +{
>> + struct pds_vfio_lm_file *lm_file = NULL;
>> + unsigned long long npages;
>> + struct page **pages;
>> + int err = 0;
>> +
>> + if (!size)
>> + return NULL;
>> +
>> + /* Alloc file structure */
>> + lm_file = kzalloc(sizeof(*lm_file), GFP_KERNEL);
>> + if (!lm_file)
>> + return NULL;
>> +
>> + /* Create file */
>> + lm_file->filep = anon_inode_getfile(name, fops, lm_file, flags);
>> + if (!lm_file->filep)
>> + goto err_get_file;
>> +
>> + stream_open(lm_file->filep->f_inode, lm_file->filep);
>> + mutex_init(&lm_file->lock);
>> +
>> + lm_file->size = size;
>> +
>> + /* Allocate memory for file pages */
>> + npages = DIV_ROUND_UP_ULL(lm_file->size, PAGE_SIZE);
>> +
>> + pages = kcalloc(npages, sizeof(*pages), GFP_KERNEL);
>> + if (!pages)
>> + goto err_alloc_pages;
>> +
>> + for (unsigned long long i = 0; i < npages; i++) {
>> + pages[i] = alloc_page(GFP_KERNEL);
>> + if (!pages[i])
>> + goto err_alloc_page;
>> + }
>> +
>> + lm_file->pages = pages;
>> + lm_file->npages = npages;
>> + lm_file->alloc_size = npages * PAGE_SIZE;
>> +
>> + /* Create scatterlist of file pages to use for DMA mapping later */
>> + err = sg_alloc_table_from_pages(&lm_file->sg_table, pages, npages,
>> + 0, size, GFP_KERNEL);
>> + if (err)
>> + goto err_alloc_sg_table;
> This is the same basic thing the mlx5 driver does, you should move the
> mlx5 code into some common place and just re-use it here.
I looked at the mlx5 code and even though the two drivers are doing the
same basic thing, IMHO it doesn't seem like a straight forward task as
the mlx5 code seems to have some device/driver specifics mixed in. I'd
prefer not trying to refactor/commonize this bit of code at this point
in time.
However, it does seem like a good future improvement once things quiet
down after getting this initial series merged.
>> diff --git a/drivers/vfio/pci/pds/vfio_dev.h
b/drivers/vfio/pci/pds/vfio_dev.h
>> index 10557e8dc829..3f55861ffc7c 100644
>> --- a/drivers/vfio/pci/pds/vfio_dev.h
>> +++ b/drivers/vfio/pci/pds/vfio_dev.h
>> @@ -7,10 +7,20 @@
>> #include <linux/pci.h>
>> #include <linux/vfio_pci_core.h>
>>
>> +#include "lm.h"
>> +
>> struct pds_vfio_pci_device {
>> struct vfio_pci_core_device vfio_coredev;
>> struct pci_dev *pdev;
>> void *pdsc;
>> + struct device *coredev;
> Why? If this is just &pdev->dev it it doesn't need to be in the struct
> And pdev is just vfio_coredev->pdev, don't need to duplicate it either
This was actually the pds_core's device structure. I have removed this
in my local tree and instead use the pci_physfn() to get pds_core's
struct device. Will be fixed in the next revision.
>> +static void
>> +pds_vfio_recovery_work(struct work_struct *work)
>> +{
>> + struct pds_vfio_pci_device *pds_vfio =
>> + container_of(work, struct pds_vfio_pci_device, work);
>> + bool deferred_reset_needed = false;
>> +
>> + /* Documentation states that the kernel migration driver must not
>> + * generate asynchronous device state transitions outside of
>> + * manipulation by the user or the VFIO_DEVICE_RESET ioctl.
>> + *
>> + * Since recovery is an asynchronous event received from the device,
>> + * initiate a deferred reset. Only issue the deferred reset if a
>> + * migration is in progress, which will cause the next step of the
>> + * migration to fail. Also, if the device is in a state that will
>> + * be set to VFIO_DEVICE_STATE_RUNNING on the next action (i.e. VM is
>> + * shutdown and device is in VFIO_DEVICE_STATE_STOP) as that will
clear
>> + * the VFIO_DEVICE_STATE_ERROR when the VM starts back up.
>> + */
>> + mutex_lock(&pds_vfio->state_mutex);
>> + if ((pds_vfio->state != VFIO_DEVICE_STATE_RUNNING &&
>> + pds_vfio->state != VFIO_DEVICE_STATE_ERROR) ||
>> + (pds_vfio->state == VFIO_DEVICE_STATE_RUNNING &&
>> + pds_vfio_dirty_is_enabled(pds_vfio)))
>> + deferred_reset_needed = true;
>> + mutex_unlock(&pds_vfio->state_mutex);
>> +
>> + /* On the next user initiated state transition, the device will
>> + * transition to the VFIO_DEVICE_STATE_ERROR. At this point it's
the user's
>> + * responsibility to reset the device.
>> + *
>> + * If a VFIO_DEVICE_RESET is requested post recovery and before
the next
>> + * state transition, then the deferred reset state will be set to
>> + * VFIO_DEVICE_STATE_RUNNING.
>> + */
>> + if (deferred_reset_needed)
>> + pds_vfio_deferred_reset(pds_vfio, VFIO_DEVICE_STATE_ERROR);
>> +}
> Why is this a work? it is threaded on a blocking_notifier_chain so it
> can call the mutex?
I think the work item can be dropped and the contents of the work
function can be moved in the notifier callback. I will fix this in the
next revision.
> Why is the locking like this, can't you just call
> pds_vfio_deferred_reset() under the mutex?
It was done to avoid any lock ordering issues with
pds_vfio_state_mutex_unlock() or pds_vfio_reset().
>> Add Kconfig entries and pds_vfio.rst. Also, add an entry in the
>> MAINTAINERS file for this new driver.
>>
>> It's not clear where documentation for vendor specific VFIO
>> drivers should live, so just re-use the current amd
>> ethernet location.
> It would be nice to make a kdoc section for vfio.
It seems like there are already vfio docs in Documentation/driver-api/,
but the kdoc added in this patch is slightly different since it's vendor
specific. Which of the following locations make the most sense?
[1] Documentation/vfio/<vendor>/<vendor_kdoc>
- Documentation/vfio/amd/pds_vfio.rst
[2] Documentation/vfio/vendor-drivers/<vendor_kdoc>
- Documentation/vfio/vendor-drivers/pds_vfio.rst
Thanks again for the time and feedback,
Brett