RE: [RFC PATCH v4 02/25] ice: Create and register virtual bus for RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Jason Gunthorpe <jgg@xxxxxxxx>
> Sent: Friday, February 14, 2020 12:40 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@xxxxxxxxx>
> Cc: davem@xxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; Ertman, David M
> <david.m.ertman@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> rdma@xxxxxxxxxxxxxxx; nhorman@xxxxxxxxxx; sassmann@xxxxxxxxxx;
> Nguyen, Anthony L <anthony.l.nguyen@xxxxxxxxx>; Bowers, AndrewX
> <andrewx.bowers@xxxxxxxxx>
> Subject: Re: [RFC PATCH v4 02/25] ice: Create and register virtual bus for
> RDMA
> 
> On Wed, Feb 12, 2020 at 11:14:01AM -0800, Jeff Kirsher wrote:
> > +/**
> > + * ice_init_peer_devices - initializes peer devices
> > + * @pf: ptr to ice_pf
> > + *
> > + * This function initializes peer devices on the virtual bus.
> > + */
> > +int ice_init_peer_devices(struct ice_pf *pf)
> > +{
> > +	struct ice_vsi *vsi = pf->vsi[0];
> > +	struct pci_dev *pdev = pf->pdev;
> > +	struct device *dev = &pdev->dev;
> > +	int status = 0;
> > +	int i;
> > +
> > +	/* Reserve vector resources */
> > +	status = ice_reserve_peer_qvector(pf);
> > +	if (status < 0) {
> > +		dev_err(dev, "failed to reserve vectors for peer drivers\n");
> > +		return status;
> > +	}
> > +	for (i = 0; i < ARRAY_SIZE(ice_peers); i++) {
> > +		struct ice_peer_dev_int *peer_dev_int;
> > +		struct ice_peer_drv_int *peer_drv_int;
> > +		struct iidc_qos_params *qos_info;
> > +		struct iidc_virtbus_object *vbo;
> > +		struct msix_entry *entry = NULL;
> > +		struct iidc_peer_dev *peer_dev;
> > +		struct virtbus_device *vdev;
> > +		int j;
> > +
> > +		/* structure layout needed for container_of's looks like:
> > +		 * ice_peer_dev_int (internal only ice peer superstruct)
> > +		 * |--> iidc_peer_dev
> > +		 * |--> *ice_peer_drv_int
> > +		 *
> > +		 * iidc_virtbus_object (container_of parent for vdev)
> > +		 * |--> virtbus_device
> > +		 * |--> *iidc_peer_dev (pointer from internal struct)
> > +		 *
> > +		 * ice_peer_drv_int (internal only peer_drv struct)
> > +		 */
> > +		peer_dev_int = devm_kzalloc(dev, sizeof(*peer_dev_int),
> > +					    GFP_KERNEL);
> > +		if (!peer_dev_int)
> > +			return -ENOMEM;
> > +
> > +		vbo = kzalloc(sizeof(*vbo), GFP_KERNEL);
> > +		if (!vbo) {
> > +			devm_kfree(dev, peer_dev_int);
> > +			return -ENOMEM;
> > +		}
> > +
> > +		peer_drv_int = devm_kzalloc(dev, sizeof(*peer_drv_int),
> > +					    GFP_KERNEL);
> 
> To me, this looks like a lifetime mess. All these devm allocations
> against the parent object are being referenced through the vbo with a
> different kref lifetime. The whole thing has very unclear semantics
> who should be cleaning up on error

Will cover this at the end after addressing your following points =) 

In my reply, I am going to refer to the kernel object that is registering the
virtbus_device(s) as KO_device and the kernel object that is registering
the virtbus_driver(s) as KO_driver.

> 
> > +		if (!peer_drv_int) {
> > +			devm_kfree(dev, peer_dev_int);
> > +			kfree(vbo);
> 
> ie here we free two things

At this point in the init flow for KO_device, there has only been kallocs done,
no device has been registered with virtbus.  So, only memory cleanup is
required.

> 
> > +			return -ENOMEM;
> > +		}
> > +
> > +		pf->peers[i] = peer_dev_int;
> > +		vbo->peer_dev = &peer_dev_int->peer_dev;
> > +		peer_dev_int->peer_drv_int = peer_drv_int;
> > +		peer_dev_int->peer_dev.vdev = &vbo->vdev;
> > +
> > +		/* Initialize driver values */
> > +		for (j = 0; j < IIDC_EVENT_NBITS; j++)
> > +			bitmap_zero(peer_drv_int->current_events[j].type,
> > +				    IIDC_EVENT_NBITS);
> > +
> > +		mutex_init(&peer_dev_int->peer_dev_state_mutex);
> > +
> > +		peer_dev = &peer_dev_int->peer_dev;
> > +		peer_dev->peer_ops = NULL;
> > +		peer_dev->hw_addr = (u8 __iomem *)pf->hw.hw_addr;
> > +		peer_dev->peer_dev_id = ice_peers[i].id;
> > +		peer_dev->pf_vsi_num = vsi->vsi_num;
> > +		peer_dev->netdev = vsi->netdev;
> > +
> > +		peer_dev_int->ice_peer_wq =
> > +			alloc_ordered_workqueue("ice_peer_wq_%d",
> WQ_UNBOUND,
> > +						i);
> > +		if (!peer_dev_int->ice_peer_wq)
> > +			return -ENOMEM;
> 
> Here we free nothing

This is a miss on my part.  At this point we should keep consistent and free the memory
that has been allocated as we unwind.  

> 
> > +
> > +		peer_dev->pdev = pdev;
> > +		qos_info = &peer_dev->initial_qos_info;
> > +
> > +		/* setup qos_info fields with defaults */
> > +		qos_info->num_apps = 0;
> > +		qos_info->num_tc = 1;
> > +
> > +		for (j = 0; j < IIDC_MAX_USER_PRIORITY; j++)
> > +			qos_info->up2tc[j] = 0;
> > +
> > +		qos_info->tc_info[0].rel_bw = 100;
> > +		for (j = 1; j < IEEE_8021QAZ_MAX_TCS; j++)
> > +			qos_info->tc_info[j].rel_bw = 0;
> > +
> > +		/* for DCB, override the qos_info defaults. */
> > +		ice_setup_dcb_qos_info(pf, qos_info);
> > +
> > +		/* make sure peer specific resources such as msix_count and
> > +		 * msix_entries are initialized
> > +		 */
> > +		switch (ice_peers[i].id) {
> > +		case IIDC_PEER_RDMA_ID:
> > +			if (test_bit(ICE_FLAG_IWARP_ENA, pf->flags)) {
> > +				peer_dev->msix_count = pf-
> >num_rdma_msix;
> > +				entry = &pf->msix_entries[pf-
> >rdma_base_vector];
> > +			}
> > +			break;
> > +		default:
> > +			break;
> > +		}
> > +
> > +		peer_dev->msix_entries = entry;
> > +		ice_peer_state_change(peer_dev_int,
> ICE_PEER_DEV_STATE_INIT,
> > +				      false);
> > +
> > +		vdev = &vbo->vdev;
> > +		vdev->name = ice_peers[i].name;
> > +		vdev->release = ice_peer_vdev_release;
> > +		vdev->dev.parent = &pdev->dev;
> > +
> > +		status = virtbus_dev_register(vdev);
> > +		if (status) {
> > +			virtbus_dev_unregister(vdev);
> > +			vdev = NULL;
> 
> Here we double unregister and free nothing.
> 
> You need to go through all of this really carefully and make some kind
> of sane lifetime model and fix all the error unwinding :(

Thanks for catching this.  A failure in virtbus_register_device()  does
*not* require a call virtbus_unregister_device.  The failure path for the
register function handles this.  Also, we need to remain consistent with freeing
on unwind.

> 
> Why doesn't the release() function of vbo trigger the free of all this
> peer related stuff?
> 
> Use a sane design model of splitting into functions to allocate single
> peices of memory, goto error unwind each function, and build things up
> properly.
> 
> Jason

I am going to add this to the documentation to record the following information.

The KO_device is responsible for allocating the memory for the virtbus_device
and keeping it viable for the lifetime of the KO_device.  KO_device will call
virtbus_register_device to start using the virtbus_device, and KO_device is
responslble for calling virtbus_unregister_device either on KO_device's exit
path (remove/shutdown) or when it is done using the virtbus subsystem.

The KO_driver is responsible for allocating the memory for the virtbus_driver
and keeping it viable for the lifetime of the KO_driver. KO_driver will call
virtbus_register_driver to start using the virtbus_driver, and KO_driver is
responsible for calling virtbus_unregister_driver either on KO_driver's exit
path (remove/shutdown) or when it is done using the virtbus subsystem.

The premise is that the KO_device and KO_driver can load and unload multiple
times and they can reconnect to each other through the virtbus on each
occurrence of their reloads.  So one example of a flow looks like the following:

- KO_device loads (order of KO_device and KO_driver loading is irrelevant)
- KO_device allocates memory for virtbus_device(s) it expects to use and
        any backing memory it is going to use to interact with KO_driver.
- KO_device performs virtbus_register_device() which is the *only* place
        a device_initialize() is performed for virtbus_device.

- KO_driver loads
- KO_driver allocates memory for virtbus_driver(s) it expects to use and
        any backing memory it expects to use to interact with KO_device
- KO_driver performs virtbus_register_driver()

- virtbus matches virtbus_device and virtbus_driver and calls the
        virtbus_drivers's probe()

- KO_driver and KO_device interact with each other however they choose to do so.

- KO_device (for example) receives a call to its remove callback
- KO_device's unload path severs any interaction the KO_device and KO_driver
        were having - implementation dependant
- KO_device's unload path is required to perform a call to
        virtbus_unregister_device().  virtbus_unregister_device() is the *only*
        place a put_device() is performed.
- KO_device's unload path frees memory associated with the virtbus_device

- vitbus calls KO_drivers's .remove callback defined for the virtbus_driver

So, the lifespan of the virtbus_device is controlled by KO_device and the
lifespan of virtbus_driver is controlled by KO_driver.

It is required for the KO's to "allocate -> register -> unregister -> free"
virtbus objects.

-DaveE




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux