RE: [PATCH 1/6] Add ancillary bus support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Leon Romanovsky <leon@xxxxxxxxxx>
> Sent: Friday, October 2, 2020 11:50 AM
> 
> On Fri, Oct 02, 2020 at 05:29:26AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Leon Romanovsky <leon@xxxxxxxxxx>
> > > Sent: Friday, October 2, 2020 1:02 AM
> >
> > > > > > > > +Registering an ancillary_device is a two-step process.
> > > > > > > > +First you must call ancillary_device_initialize(), which
> > > > > > > > +will check several aspects of the ancillary_device struct
> > > > > > > > +and perform a device_initialize().  After this step
> > > > > > > > +completes, any error state must have a call to
> > > > > > > > +put_device() in its
> > > > > > > resolution
> > > > > > > > +path.  The second step in registering an ancillary_device
> > > > > > > > +is to perform a
> > > > > > > call
> > > > > > > > +to ancillary_device_add(), which will set the name of the
> > > > > > > > +device and add
> > > > > > > the
> > > > > > > > +device to the bus.
> > > > > > > > +
> > > > > > > > +To unregister an ancillary_device, just a call to
> > > > > > > ancillary_device_unregister()
> > > > > > > > +is used.  This will perform both a device_del() and a
> put_device().
> > > > > > >
> > > > > > > Why did you chose ancillary_device_initialize() and not
> > > > > > > ancillary_device_register() to be paired with
> > > > > ancillary_device_unregister()?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > We originally had a single call to ancillary_device_register()
> > > > > > that paired with unregister, but there was an ask to separate
> > > > > > the register into an initialize and add to make the error
> > > > > > condition unwind more
> > > > > compartimentalized.
> > > > >
> > > > > It is correct thing to separate, but I would expect:
> > > > > ancillary_device_register()
> > > > > ancillary_device_add()
> > > > >
> > > > device_initialize(), device_add() and device_unregister() is the
> > > > pattern widely
> > > followed in the core.
> > >
> > > It doesn't mean that I need to agree with that, right?
> > >
> > Right. May be I misunderstood your comment where you said "I expect
> device_register() and device_add()" in response to "device_initialize() and
> device_add".
> > I interpreted your comment as to replace initialize() with register().
> > Because that is odd naming and completely out of sync from the core APIs.
> >
> > A helper like below that wraps initialize() and add() is buggy, because when
> register() returns an error, it doesn't know if should do kfree() or
> put_device().
> 
> I wrote above (14 lines above this line) that I understand and support the
> request to separate init and add parts. There is only one thing that I didn't
> like that we have _unregister() but don't have _register().
> It is not a big deal.
> 
> >
> > ancillary_device_register(adev)
> > {
> >   ret = ancillary_device_initialize();
> >   if (ret)
> >     return ret;
> >
> >   ret = ancillary_device_add();
> >   if (ret)
> >     put_device(dev);
> >   return ret;
> > }
> >
> > > >
> > > > > vs.
> > > > > ancillary_device_unregister()
> > > > >
> > > > > It is not a big deal, just curious.
> > > > >
> > > > > The much more big deal is that I'm required to create 1-to-1
> > > > > mapping between device and driver, and I can't connect all my
> > > > > different modules to one xxx_core.pf.y device in N-to-1 mapping. "N"
> > > > > represents different protocols (IB, ETH, SCSI) and "1" is one PCI core.
> > > >
> > > > For one mlx5 (pf/vf/sf) device, there are three class erivers (ib, vdpa,
> en).
> > > >
> > > > So there should be one ida allocate per mlx5 device type.
> > > >
> > > > Static const mlx5_adev_names[MLX5_INTERFACE_PROTOCOL_MAX] =
> {
> > > > 	"rdma",
> > > > 	"eth",
> > > > 	"vdpa"
> > > > };
> > > >
> > > > Something like for current mlx5_register_device(),
> > >
> > > I know it and already implemented it, this is why I'm saying that it
> > > is not what I would expect from the implementation.
> > >
> > > It is wrong create mlx5_core.rdma.1 device that is equal to
> > > mlx5_core.eth.1 just to connect our mlx5_ib.ko to it, while
> > > documentation explains about creating
> >
> > Ancillary bus's documentation? If so, it should be corrected.
> > Do you have specific text snippet to point to that should be fixed?
> 
> +One example could be a multi-port PCI network device that is
> +rdma-capable and needs to export this functionality and attach to an
> +rdma driver in another subsystem.  The PCI driver will allocate and
> +register an ancillary_device for each physical function on the NIC.
> +The rdma driver will register an ancillary_driver that will be matched
> +with and probed for each of these ancillary_devices.  This will give
> +the rdma driver access to the shared data/ops in the PCI drivers shared
> object to establish a connection with the PCI driver.
> 
> >
> > For purpose of matching service functionality, for each different class (ib,
> eth, vdpa) there is one ancillary device created.
> > What exactly is wrong here? (and why is it wrong now and not in
> > previous version of the RFC?)
> 
> Here the example of two real systems, see how many links we created to
> same mlx5_core PCI logic. I imagine that Qlogic and Chelsio drivers will look
> even worse, because they spread over more subsystems than mlx5.
> 
> This is first time when I see real implementation of real device.
> 
> System with 2 IB and 1 RoCE cards:
> [leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
>  mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.eth.0
>  mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.eth.1
>  mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
This gives you the ability to not load the netdevice and rdma device of a VF and only load the vdpa device.
These are real use case that users have asked for.
In use case one, they are only interested in rdma device.
In second use case only vdpa device.
How shall one achieve that without spinning of the device for each class?

>  mlx5_core.ib.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.ib.0
>  mlx5_core.ib.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.ib.1
>  mlx5_core.ib.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.ib.2
>  mlx5_core.vdpa.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.ib.0
>  mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.ib.1
>  mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.ib.2
> [leonro@vm ~]$ rdma dev
> 0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455
> sys_image_guid 5254:00c0:fe12:3455
> 1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456
> sys_image_guid 5254:00c0:fe12:3456
> 2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457
> sys_image_guid 5254:00c0:fe12:3457
> 
> System with RoCE SR-IOV card with 4 VFs:
> 
> Maybe, what I would like to see is:
> [leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
>  mlx5_core.pf.0 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.pf.0
>  mlx5_core.vf.0 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vf.0
>  mlx5_core.vf.1 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vf.1
>  mlx5_core.vf.2 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vf.2
>  mlx5_core.vf.3 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vf.3
> 
> The mlx5_ib, mlx5_vdpa and mlx5_en will connect to one of mlx5_core.XX.YY
> devices.
In that case, I really don't see the need of creating ancillary device at all.
A generic wrapper around blocking_notifier_chain_register() with a notion of id, and some well defined structure as library can serve the purpose.
But it will miss out power suspend() resume() hooks, which we get for free now using ancillary device, in addition to the ability to selectively load only few class device.
Each 'struct device's is close to 744 bytes in size in 5.9 kernel.
Is so many 'struct device' of ancillary type a real problem, that aims to address these use cases?

> 
> Thanks




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux