On 7/17/24 10:36, Leon Romanovsky wrote: > On Wed, Jul 17, 2024 at 07:08:59AM +0000, Omer Shpigelman wrote: >> On 7/16/24 16:40, Jason Gunthorpe wrote: >>> On Sun, Jul 14, 2024 at 10:18:12AM +0000, Omer Shpigelman wrote: >>>> On 7/12/24 16:08, Jason Gunthorpe wrote: >>>>> [You don't often get email from jgg@xxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>>> >>>>> On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote: >>>>> >>>>>> We need the core driver to access the IB driver (and to the ETH driver as >>>>>> well). As you wrote, we can't use exported symbols from our IB driver nor >>>>>> rely on function pointers, but what about providing the core driver an ops >>>>>> structure? meaning exporting a register function from the core driver that >>>>>> should be called by the IB driver during auxiliary device probe. >>>>>> Something like: >>>>>> >>>>>> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev, >>>>>> struct hbl_ib_ops *ops) >>>>>> { >>>>>> ... >>>>>> } >>>>>> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev); >>>>> >>>>> Definately do not do some kind of double-register like this. >>>>> >>>>> The auxiliary_device scheme can already be extended to provide ops for >>>>> each sub device. >>>>> >>>>> Like >>>>> >>>>> struct habana_driver { >>>>> struct auxiliary_driver base; >>>>> const struct habana_ops *ops; >>>>> }; >>>>> >>>>> If the ops are justified or not is a different question. >>>>> >>>> >>>> Well, I suggested this double-register option because I got a comment that >>>> the design pattern of embedded ops structure shouldn't be used. >>>> So I'm confused now... >>> >>> Yeah, don't stick ops in random places, but the device_driver is the >>> right place. >>> >> >> Sorry, let me explain again. My original code has an ops structure >> exactly like you are suggesting now (see struct hbl_aux_dev in the first >> patch of the series). But I was instructed not to use this ops structure >> and to rely on exported symbols for inter-driver communication. >> I'll be happy to use this ops structure like in your example rather than >> converting my code to use exported symbols. >> Leon - am I missing anything? what's the verdict here? > > You are missing the main sentence from Jason's response: "don't stick ops in random places". > > It is fine to have ops in device driver, so the core driver can call them. However, in your > original code, you added ops everywhere. It caused to the need to implement module reference > counting and crazy stuff like calls to lock and unlock functions from the aux driver to the core. > > Verdict is still the same. Core driver should provide EXPORT_SYMBOLs, so the aux driver can call > them directly and enjoy from proper module loading and unloading. > > The aux driver can have ops in the device driver, so the core driver can call them to perform something > specific for that aux driver. > > Calls between aux drivers should be done via the core driver. > > Thanks The only place we have an ops structure is in the device driver, similarly to Jason's example. In our code it is struct hbl_aux_dev. What other random places did you see? We have several auxiliary devices so we have several instances of this structure but the definition is in a single place. The module reference counting is unrelated to the ops structure - we used it to block the son driver removal while the parent driver can access it. Even with exported symbols we would use it. Anyway, in v2 we'd like to allow the son driver removal before the parent so this module reference counting will be removed. The lock/unlock functions are also unrelated to the ops structure, we would add these even with exported symbols. The reason is that our NIC drivers are the sons/grandsons of a compute device which can enter a reset flow as part of a TDR mechanism. During this flow we must not access the HW so we need to block a parallel son device probing. In addition, we don't have any direct communication between the aux drivers, everything is done via the parent driver. Given all of the above, what is the problem with our current code? we did exactly what Jason wrote in his example - having an ops structure in the device driver to allow inter-driver communication. The only issue I see here is the question if this ops structure is for unidirectional communication (meaning parent to son only) or for bidirectional communication between the drivers (meaning also son to parent). That's the only point that was not mentioned by Jason while you are clear about the answer. AFAIU EXPORT_SYMBOLs should be used to expose driver level operations, not operations which are device specific (and that's our case). Hence we used this ops structure also for son-to-parent communication, although we can switch them with exported symbols if we have to.