On Mon, Nov 28, 2022 at 06:01:13PM -0800, Ajit Khaparde wrote: > On Tue, Nov 22, 2022 at 10:59 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > On Tue, Nov 22, 2022 at 07:02:45AM -0800, Ajit Khaparde wrote: > > > On Wed, Nov 16, 2022 at 5:22 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > > > > ::snip:: > > > > > > All PCI management logic and interfaces are needed to be inside eth part > > > > > > of your driver and only that part should implement SR-IOV config. Once > > > > > > user enabled SR-IOV, the PCI driver should create auxiliary devices for > > > > > > each VF. These device will have RDMA capabilities and it will trigger RDMA > > > > > > driver to bind to them. > > > > > I agree and once the PF creates the auxiliary devices for the VF, the RoCE > > > > > Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re > > > > > design is that > > > > > the RoCE driver is responsible for making adjustments to the RoCE resources. > > > > > > > > You can still do these adjustments by checking type of function that > > > > called to RDMA .probe. PCI core exposes some functions to help distinguish between > > > > PF and VFs. > > > > > > > > > > > > > > So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the > > > > > NIC resources for the VF, and such, it tries to call into the bnxt_re > > > > > driver for the > > > > > same purpose. > > > > > > > > If I read code correctly, all these resources are for one PCI function. > > > > > > > > Something like this: > > > > > > > > bnxt_re_probe() > > > > { > > > > ... > > > > if (is_virtfn(p)) > > > > bnxt_re_sriov_config(p); > > > > ... > > > > } > > > I understand what you are suggesting. > > > But what I want is a way to do this in the context of the PF > > > preferably before the VFs are probed. > > > > I don't understand the last sentence. You call to this sriov_config in > > bnxt_re driver without any protection from VFs being probed, > > Let me elaborate - > When a user sets num_vfs to a non-zero number, the PCI driver hook > sriov_configure calls bnxt_sriov_configure(). Once pci_enable_sriov() > succeeds, bnxt_ulp_sriov_cfg() is issued under bnxt_sriov_configure(). > All this happens under bnxt_en. > bnxt_ulp_sriov_cfg() ultimately calls into the bnxt_re driver. > Since bnxt_sriov_configure() is called only for PFs, bnxt_ulp_sriov_cfg() > is called for PFs only. > > Once bnxt_ulp_sriov_cfg() calls into the bnxt_re via the ulp_ops, > it adjusts the QPs, SRQs, CQs, MRs, GIDs and such. Once you called to pci_enable_sriov(), PCI core created sysfs entries and it triggers udev rules and VFs probe. Because you are calling it in bnxt_sriov_configure(), you will have inherit protection for PF with PCI lock, but not for VFs. > > > > > > So we are trying to call the > > > bnxt_re_sriov_config in the context of handling the PF's > > > sriov_configure implementation. Having the ulp_ops is allowing us to > > > avoid resource wastage and assumptions in the bnxt_re driver. > > > > To which resource wastage are you referring? > Essentially the PF driver reserves a set of above resources for the PF, > and divides the remaining resources among the VFs. > If the calculation is based on sriov_totalvfs instead of sriov_numvfs, > there can be a difference in the resources provisioned for a VF. > And that is because a user may create a subset of VFs instead of the > total VFs allowed in the PCI SR-IOV capability register. > I was referring to the resource wastage in that deployment scenario. It is ok, set all needed limits in bnxt_en. You don't need to call to bnxt_re for that. > > Thanks > Ajit > > > > > There are no differences if same limits will be in bnxt_en driver when > > RDMA bnxt device is created or in bnxt_re which will be called once RDMA > > device is created. > > > > Thanks > > > > > > > > ::snip:: > > > >