On Mon, Feb 24, 2020 at 07:01:43PM +0000, Parav Pandit wrote: > On 2/24/2020 12:29 PM, Jason Gunthorpe wrote: > > On Mon, Feb 24, 2020 at 12:52:06PM +0200, Leon Romanovsky wrote: > >>> Are you asking why bonding should be implemented as dedicated > >>> ulp/driver, and not as an extension by the vendor driver? > >> > >> No, I meant something different. You are proposing to combine IB > >> devices, while keeping netdev devices separated. I'm asking if it is > >> possible to combine netdev devices with already existing bond driver > >> and simply create new ib device with bond netdev as an underlying > >> provider. > > > > Isn't that basically what we do now in mlx5? > > > And its broken for few aspects that I described in Q&A question-1 in > this thread previously. > > On top of that user has no ability to disable rdma bonding. And what does that mean? The real netdevs have no IP addreses so what exactly does a non-bonded RoCEv2 RDMA device do? > User exactly asked us that they want to disable in some cases. > (not on mailing list). So there are non-upstream hacks exists that is > not applicable for this discussion. Bah, I'm aware of that - that request is hack solution to something else as well. > > Logically the ib_device is attached to the bond, it uses the bond for > > IP addressing, etc. > > > > I'm not sure trying to have 3 ib_devices like netdev does is sane, > > that is very, very complicated to get everything to work. The ib stuff > > just isn't designed to be stacked like that. > > > I am not sure I understand all the complications you have thought through. > I thought of few and put forward in the Q&A in the thread and we can > improve the design as we go forward. > > Stacking rdma device on top of existing rdma device as an ib_client so > that rdma bond device exactly aware of what is going on with slaves and > bond netdev. How do you safely proxy every single op from the bond to slaves? How do you force the slaves to allow PDs to be shared across them? What provider lives in user space for the bond driver? What happens to the udata/etc? And it doesn't solve the main problem you raised, creating a IB device while holding RTNL simply should not ever be done. Moving this code into the core layer fixed it up significantly for the similar rxe/siw cases, I expect the same is possible for the LAG situation too. Jason