>-----Original Message----- >From: Dalessandro, Dennis >Sent: Friday, February 8, 2019 8:41 AM >To: Jason Gunthorpe <jgg@xxxxxxxx> >Cc: dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Ruhl, Michael J ><michael.j.ruhl@xxxxxxxxx>; Marciniszyn, Mike ><mike.marciniszyn@xxxxxxxxx> >Subject: Re: [PATCH for-rc 1/7] IB/{hfi1, qib, rdmavt}: Do not depend on IB >Verbs name for driver logging > >On 1/17/2019 4:04 PM, Jason Gunthorpe wrote: >> On Thu, Jan 17, 2019 at 12:41:12PM -0800, Dennis Dalessandro wrote: >>> From: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx> >>> >>> Commit 896de0090a85 ("RDMA/core: Use dev_name instead of >>> ibdev->name"), changes the API for setting the IB device name. >>> >>> HFI, QIB, and RDMAVT make assumptions with regards to this API (i.e. >>> as to when the device name string is available). This assumption is >>> no longer valid. >>> >>> Update the drivers to reflect the commit by storing the specific >>> device name in device local information. Create a calldown for the >>> RVT driver to access the drivers name information. >>> >>> Rework the logging macros to use the best device name. >>> >>> Fixes: 11f0e89710eb ("IB/{hfi1, qib}: Fix a concurrency issue with device >name in logging") >>> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> >>> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx> >>> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx> >>> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx> >>> --- >>> drivers/infiniband/hw/hfi1/affinity.c | 6 +++-- >>> drivers/infiniband/hw/hfi1/driver.c | 8 +++++++ >>> drivers/infiniband/hw/hfi1/hfi.h | 22 ++++++++----------- >>> drivers/infiniband/hw/hfi1/init.c | 2 +- >>> drivers/infiniband/hw/hfi1/verbs.c | 1 + >>> drivers/infiniband/hw/qib/qib.h | 12 +++++----- >>> drivers/infiniband/hw/qib/qib_driver.c | 8 +++++++ >>> drivers/infiniband/hw/qib/qib_init.c | 10 ++++++-- >>> drivers/infiniband/hw/qib/qib_verbs.c | 1 + >>> drivers/infiniband/sw/rdmavt/trace.h | 6 +++-- >>> drivers/infiniband/sw/rdmavt/vt.c | 2 +- >>> drivers/infiniband/sw/rdmavt/vt.h | 10 ++++++-- >>> include/rdma/rdma_vt.h | 38 ++++++-------------------------- >>> 13 files changed, 64 insertions(+), 62 deletions(-) >> >> This is too ugly. Drivers should not be defining their own printing >> macros, or dealing with this name stuff. They shouldn't be randomly >> choosing to print to the PCI struct device vs the ib struct device and >> all sorts of other inconsistent things we have. > >Why is it ugly? This isn't new. There is nothing random about what is >chosen to print. The PCI is used before we have an IB dev. That doesn't >change before/after this patch. > >Are you referring to inconsistent things in hfi1/rdmavt/qib or from an >IB core point of view? > >> If drivers need to print prior to registration, then we need to copy >> what netdev has been doing and provide our own printing functions >> accepting a struct ib_device* that go through various gyrations to >> compute a suitable identification string across the entire life cycle >> - ie netdev will use a PCI BDF and other options if the ethX name is >> not yet assigned. > >Which is what we do here, use the PCI BDF right? Did I miss something? >Or are you saying it should be added to the core rather than the driver. >I'd be OK with that as well. > >> I stopped short of doing this when I injected dev_name, because it >> didn't seem necessary, but if you say hfi has a problem, then this is >> how to fix it. >> >>> --- a/drivers/infiniband/hw/hfi1/init.c >>> +++ b/drivers/infiniband/hw/hfi1/init.c >>> @@ -1313,7 +1313,7 @@ void hfi1_free_devdata(struct hfi1_devdata *dd) >>> "Could not allocate unit ID: error %d\n", -ret); >>> goto bail; >>> } >>> - rvt_set_ibdev_name(&dd->verbs_dev.rdi, "%s_%d", class_name(), >dd->unit); >>> + kobject_set_name(&dd->kobj, "%s_%d", class_name(), dd->unit); >> >> Woah, this driver has a kobj and a struct ib_device in the same >> structure? Betcha the refcounting is done wrong :( > >Been like that for quite a while. Mike has looked at this and I trust >him when he says the refcounting is correct. Do you see something wrong >here? If so please point it out. > >> This kobject is completely redundant with the kobject that is already >> in the struct ib_device, you should delete it, not use it for more >> stuff. > >The kobject in the ib_device is just that, for ib_device stuff. The one >here is for hfi1 stuff. Why muddle them together? Is there an advantage >to that? > >>> /* >>> * Initialize all locks for the device. This needs to be as early as >>> diff --git a/drivers/infiniband/hw/hfi1/verbs.c >b/drivers/infiniband/hw/hfi1/verbs.c >>> index ec582d8..6546cba 100644 >>> --- a/drivers/infiniband/hw/hfi1/verbs.c >>> +++ b/drivers/infiniband/hw/hfi1/verbs.c >>> @@ -1680,6 +1680,7 @@ int hfi1_register_ib_device(struct hfi1_devdata >*dd) >>> * Fill in rvt info object. >>> */ >>> dd->verbs_dev.rdi.driver_f.port_callback = hfi1_create_port_files; >>> + dd->verbs_dev.rdi.driver_f.get_device_name = >hfi1_get_device_name; >>> dd->verbs_dev.rdi.driver_f.get_pci_dev = get_pci_dev; >>> dd->verbs_dev.rdi.driver_f.check_ah = hfi1_check_ah; >>> dd->verbs_dev.rdi.driver_f.notify_new_ah = hfi1_notify_new_ah; >>> diff --git a/drivers/infiniband/hw/qib/qib.h >b/drivers/infiniband/hw/qib/qib.h >>> index 83d2349..3681701 100644 >>> --- a/drivers/infiniband/hw/qib/qib.h >>> +++ b/drivers/infiniband/hw/qib/qib.h >>> @@ -1075,6 +1075,8 @@ struct qib_devdata { >>> struct tasklet_struct error_tasklet; >>> >>> int assigned_node_id; /* NUMA node closest to HCA */ >>> + /* device name */ >>> + char *name; >>> }; >> >> Don't store your own names :( > >Can you explain the issue here? > >> >>> -/** >>> - * rvt_set_ibdev_name - Craft an IB device name from client info >>> - * @rdi: pointer to the client rvt_dev_info structure >>> - * @name: client specific name >>> - * @unit: client specific unit number. >>> - */ >>> -static inline void rvt_set_ibdev_name(struct rvt_dev_info *rdi, >>> - const char *fmt, const char *name, >>> - const int unit) >>> -{ >>> - /* >>> - * FIXME: rvt and its users want to touch the ibdev before >>> - * registration and have things like the name work. We don't have the >>> - * infrastructure in the core to support this directly today, hack it >>> - * to work by setting the name manually here. >>> - */ >>> - dev_set_name(&rdi->ibdev.dev, fmt, name, unit); >>> - strlcpy(rdi->ibdev.name, dev_name(&rdi->ibdev.dev), >IB_DEVICE_NAME_MAX); >> >> So why wasn't this hack good enough? What is the issue? > >You literally wrote FIXME, so Mike took that to mean he should fix it. I have an alternate patch that uses: 1) a device name private to the rdmavt instance (for rdmavt logging) (adds the name as a char) 2) doesn't use the kobject from HFI 3) doesn't use the char in QIB 4) uses hardwired text rather than a string in the logging function.: i.e : "hfi1_%d", dd->unit. Would it be useful to submit that? M >-Denny > >