> -----Original Message----- > From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Friday, February 14, 2020 12:40 PM > To: Kirsher, Jeffrey T <jeffrey.t.kirsher@xxxxxxxxx> > Cc: davem@xxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; Ertman, David M > <david.m.ertman@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux- > rdma@xxxxxxxxxxxxxxx; nhorman@xxxxxxxxxx; sassmann@xxxxxxxxxx; > Nguyen, Anthony L <anthony.l.nguyen@xxxxxxxxx>; Bowers, AndrewX > <andrewx.bowers@xxxxxxxxx> > Subject: Re: [RFC PATCH v4 02/25] ice: Create and register virtual bus for > RDMA > > On Wed, Feb 12, 2020 at 11:14:01AM -0800, Jeff Kirsher wrote: > > +/** > > + * ice_init_peer_devices - initializes peer devices > > + * @pf: ptr to ice_pf > > + * > > + * This function initializes peer devices on the virtual bus. > > + */ > > +int ice_init_peer_devices(struct ice_pf *pf) > > +{ > > + struct ice_vsi *vsi = pf->vsi[0]; > > + struct pci_dev *pdev = pf->pdev; > > + struct device *dev = &pdev->dev; > > + int status = 0; > > + int i; > > + > > + /* Reserve vector resources */ > > + status = ice_reserve_peer_qvector(pf); > > + if (status < 0) { > > + dev_err(dev, "failed to reserve vectors for peer drivers\n"); > > + return status; > > + } > > + for (i = 0; i < ARRAY_SIZE(ice_peers); i++) { > > + struct ice_peer_dev_int *peer_dev_int; > > + struct ice_peer_drv_int *peer_drv_int; > > + struct iidc_qos_params *qos_info; > > + struct iidc_virtbus_object *vbo; > > + struct msix_entry *entry = NULL; > > + struct iidc_peer_dev *peer_dev; > > + struct virtbus_device *vdev; > > + int j; > > + > > + /* structure layout needed for container_of's looks like: > > + * ice_peer_dev_int (internal only ice peer superstruct) > > + * |--> iidc_peer_dev > > + * |--> *ice_peer_drv_int > > + * > > + * iidc_virtbus_object (container_of parent for vdev) > > + * |--> virtbus_device > > + * |--> *iidc_peer_dev (pointer from internal struct) > > + * > > + * ice_peer_drv_int (internal only peer_drv struct) > > + */ > > + peer_dev_int = devm_kzalloc(dev, sizeof(*peer_dev_int), > > + GFP_KERNEL); > > + if (!peer_dev_int) > > + return -ENOMEM; > > + > > + vbo = kzalloc(sizeof(*vbo), GFP_KERNEL); > > + if (!vbo) { > > + devm_kfree(dev, peer_dev_int); > > + return -ENOMEM; > > + } > > + > > + peer_drv_int = devm_kzalloc(dev, sizeof(*peer_drv_int), > > + GFP_KERNEL); > > To me, this looks like a lifetime mess. All these devm allocations > against the parent object are being referenced through the vbo with a > different kref lifetime. The whole thing has very unclear semantics > who should be cleaning up on error Will cover this at the end after addressing your following points =) In my reply, I am going to refer to the kernel object that is registering the virtbus_device(s) as KO_device and the kernel object that is registering the virtbus_driver(s) as KO_driver. > > > + if (!peer_drv_int) { > > + devm_kfree(dev, peer_dev_int); > > + kfree(vbo); > > ie here we free two things At this point in the init flow for KO_device, there has only been kallocs done, no device has been registered with virtbus. So, only memory cleanup is required. > > > + return -ENOMEM; > > + } > > + > > + pf->peers[i] = peer_dev_int; > > + vbo->peer_dev = &peer_dev_int->peer_dev; > > + peer_dev_int->peer_drv_int = peer_drv_int; > > + peer_dev_int->peer_dev.vdev = &vbo->vdev; > > + > > + /* Initialize driver values */ > > + for (j = 0; j < IIDC_EVENT_NBITS; j++) > > + bitmap_zero(peer_drv_int->current_events[j].type, > > + IIDC_EVENT_NBITS); > > + > > + mutex_init(&peer_dev_int->peer_dev_state_mutex); > > + > > + peer_dev = &peer_dev_int->peer_dev; > > + peer_dev->peer_ops = NULL; > > + peer_dev->hw_addr = (u8 __iomem *)pf->hw.hw_addr; > > + peer_dev->peer_dev_id = ice_peers[i].id; > > + peer_dev->pf_vsi_num = vsi->vsi_num; > > + peer_dev->netdev = vsi->netdev; > > + > > + peer_dev_int->ice_peer_wq = > > + alloc_ordered_workqueue("ice_peer_wq_%d", > WQ_UNBOUND, > > + i); > > + if (!peer_dev_int->ice_peer_wq) > > + return -ENOMEM; > > Here we free nothing This is a miss on my part. At this point we should keep consistent and free the memory that has been allocated as we unwind. > > > + > > + peer_dev->pdev = pdev; > > + qos_info = &peer_dev->initial_qos_info; > > + > > + /* setup qos_info fields with defaults */ > > + qos_info->num_apps = 0; > > + qos_info->num_tc = 1; > > + > > + for (j = 0; j < IIDC_MAX_USER_PRIORITY; j++) > > + qos_info->up2tc[j] = 0; > > + > > + qos_info->tc_info[0].rel_bw = 100; > > + for (j = 1; j < IEEE_8021QAZ_MAX_TCS; j++) > > + qos_info->tc_info[j].rel_bw = 0; > > + > > + /* for DCB, override the qos_info defaults. */ > > + ice_setup_dcb_qos_info(pf, qos_info); > > + > > + /* make sure peer specific resources such as msix_count and > > + * msix_entries are initialized > > + */ > > + switch (ice_peers[i].id) { > > + case IIDC_PEER_RDMA_ID: > > + if (test_bit(ICE_FLAG_IWARP_ENA, pf->flags)) { > > + peer_dev->msix_count = pf- > >num_rdma_msix; > > + entry = &pf->msix_entries[pf- > >rdma_base_vector]; > > + } > > + break; > > + default: > > + break; > > + } > > + > > + peer_dev->msix_entries = entry; > > + ice_peer_state_change(peer_dev_int, > ICE_PEER_DEV_STATE_INIT, > > + false); > > + > > + vdev = &vbo->vdev; > > + vdev->name = ice_peers[i].name; > > + vdev->release = ice_peer_vdev_release; > > + vdev->dev.parent = &pdev->dev; > > + > > + status = virtbus_dev_register(vdev); > > + if (status) { > > + virtbus_dev_unregister(vdev); > > + vdev = NULL; > > Here we double unregister and free nothing. > > You need to go through all of this really carefully and make some kind > of sane lifetime model and fix all the error unwinding :( Thanks for catching this. A failure in virtbus_register_device() does *not* require a call virtbus_unregister_device. The failure path for the register function handles this. Also, we need to remain consistent with freeing on unwind. > > Why doesn't the release() function of vbo trigger the free of all this > peer related stuff? > > Use a sane design model of splitting into functions to allocate single > peices of memory, goto error unwind each function, and build things up > properly. > > Jason I am going to add this to the documentation to record the following information. The KO_device is responsible for allocating the memory for the virtbus_device and keeping it viable for the lifetime of the KO_device. KO_device will call virtbus_register_device to start using the virtbus_device, and KO_device is responslble for calling virtbus_unregister_device either on KO_device's exit path (remove/shutdown) or when it is done using the virtbus subsystem. The KO_driver is responsible for allocating the memory for the virtbus_driver and keeping it viable for the lifetime of the KO_driver. KO_driver will call virtbus_register_driver to start using the virtbus_driver, and KO_driver is responsible for calling virtbus_unregister_driver either on KO_driver's exit path (remove/shutdown) or when it is done using the virtbus subsystem. The premise is that the KO_device and KO_driver can load and unload multiple times and they can reconnect to each other through the virtbus on each occurrence of their reloads. So one example of a flow looks like the following: - KO_device loads (order of KO_device and KO_driver loading is irrelevant) - KO_device allocates memory for virtbus_device(s) it expects to use and any backing memory it is going to use to interact with KO_driver. - KO_device performs virtbus_register_device() which is the *only* place a device_initialize() is performed for virtbus_device. - KO_driver loads - KO_driver allocates memory for virtbus_driver(s) it expects to use and any backing memory it expects to use to interact with KO_device - KO_driver performs virtbus_register_driver() - virtbus matches virtbus_device and virtbus_driver and calls the virtbus_drivers's probe() - KO_driver and KO_device interact with each other however they choose to do so. - KO_device (for example) receives a call to its remove callback - KO_device's unload path severs any interaction the KO_device and KO_driver were having - implementation dependant - KO_device's unload path is required to perform a call to virtbus_unregister_device(). virtbus_unregister_device() is the *only* place a put_device() is performed. - KO_device's unload path frees memory associated with the virtbus_device - vitbus calls KO_drivers's .remove callback defined for the virtbus_driver So, the lifespan of the virtbus_device is controlled by KO_device and the lifespan of virtbus_driver is controlled by KO_driver. It is required for the KO's to "allocate -> register -> unregister -> free" virtbus objects. -DaveE