On Sun, 2020-03-15 at 18:47 +0200, Max Gurtovoy wrote: > On 3/14/2020 11:30 PM, Laurence Oberman wrote: > > Hello Bart, Leon and Max > > Hi Laurence, > > thanks for the great analysis and the fast response !! > > > > > Max had reached out to me to test a new set of patches for SRQ. > > I had not tested upstream ib_srpt on an LIO target for quite a > > while, > > only ib_srp client tests had been run of late. > > During a baseline test before applying Max's patches it was > > apparent > > that something had broken ib_srpt connections within LIO target > > since > > 5.5. > > > > Note thet ib_srp client connectivity with the commit functions > > fine, > > it's just the target that breaks with this commit. > > > > After a long bisect this is the commit that seems to break it. > > While it's not directly code in ib_srpt, its code in mlx5 vport > > ethernet connectivity that then breaks ib_srpt connectivity over > > mlx5 > > IB RDMA with LIO. > > I was able to connect in loopback and also from remote initiator > with > this commit. > > So I'm not sure that this commit is broken. > > I used Bart's scripts to configure the target and to connect to it > in > loopback (after some modifications for the updated > kernel/sysfs/configfs > interface). > > I did see an issue to connect from remote initiator, but after > reloading > openibd in the initiator side I was able to connect. > > So I suspect you had the same issue - that also should be debugged. > > > > > I will let Leon and others decide but reverting the below commit > > allows > > SRP connectivity to an LIO target to work again. > > I added prints to "mlx5_core_modify_hca_vport_context" function and > found that we don't call it in "pure" mlx5 mode with PFs. > > Maybe you can try it too... > > I was able to check my patches on my system and I'll send them soon. > > Thanks again Laurence and Bart. > > > > > Max, I will test your new patches once we have a decision on this. > > > > Client > > Linux ibclient.lab.eng.bos.redhat.com 5.6.0-rc5+ #1 SMP Thu Mar 12 > > 16:58:19 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux > > > > Server with reverted commit > > Linux fedstorage.bos.redhat.com 5.6.0-rc5+ #1 SMP Sat Mar 14 > > 16:39:35 > > EDT 2020 x86_64 x86_64 x86_64 GNU/Linux > > > > commit ab118da4c10a70b8437f5c90ab77adae1835963e > > Author: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > Date: Wed Nov 13 12:03:47 2019 +0200 > > > > net/mlx5: Don't write read-only fields in > > MODIFY_HCA_VPORT_CONTEXT > > command > > > > The MODIFY_HCA_VPORT_CONTEXT uses field_selector to mask > > fields > > needed > > to be written, other fields are required to be zero according > > to > > the > > HW specification. The supported fields are controlled by > > bitfield > > and limited to vport state, node and port GUIDs. > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > Signed-off-by: Saeed Mahameed <saeedm@xxxxxxxxxxxx> > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c > > b/drivers/net/ethernet/mellanox/mlx5 > > index 30f7848..1faac31f 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c > > @@ -1064,26 +1064,13 @@ int > > mlx5_core_modify_hca_vport_context(struct > > mlx5_core_dev *dev, > > > > ctx = MLX5_ADDR_OF(modify_hca_vport_context_in, in, > > hca_vport_context); > > MLX5_SET(hca_vport_context, ctx, field_select, req- > > > field_select); > > > > - MLX5_SET(hca_vport_context, ctx, sm_virt_aware, req- > > > sm_virt_aware); > > > > - MLX5_SET(hca_vport_context, ctx, has_smi, req->has_smi); > > - MLX5_SET(hca_vport_context, ctx, has_raw, req->has_raw); > > - MLX5_SET(hca_vport_context, ctx, vport_state_policy, req- > > > policy); > > > > - MLX5_SET(hca_vport_context, ctx, port_physical_state, req- > > > phys_state); > > > > - MLX5_SET(hca_vport_context, ctx, vport_state, req- > > > vport_state); > > > > - MLX5_SET64(hca_vport_context, ctx, port_guid, req- > > >port_guid); > > - MLX5_SET64(hca_vport_context, ctx, node_guid, req- > > >node_guid); > > - MLX5_SET(hca_vport_context, ctx, cap_mask1, req- > > >cap_mask1); > > - MLX5_SET(hca_vport_context, ctx, cap_mask1_field_select, > > req- > > > cap_mask1_perm); > > > > - MLX5_SET(hca_vport_context, ctx, cap_mask2, req- > > >cap_mask2); > > - MLX5_SET(hca_vport_context, ctx, cap_mask2_field_select, > > req- > > > cap_mask2_perm); > > > > - MLX5_SET(hca_vport_context, ctx, lid, req->lid); > > - MLX5_SET(hca_vport_context, ctx, init_type_reply, req- > > > init_type_reply); > > > > - MLX5_SET(hca_vport_context, ctx, lmc, req->lmc); > > - MLX5_SET(hca_vport_context, ctx, subnet_timeout, req- > > > subnet_timeout); > > > > - MLX5_SET(hca_vport_context, ctx, sm_lid, req->sm_lid); > > - MLX5_SET(hca_vport_context, ctx, sm_sl, req->sm_sl); > > - MLX5_SET(hca_vport_context, ctx, qkey_violation_counter, > > req- > > > qkey_violation_counter); > > > > - MLX5_SET(hca_vport_context, ctx, pkey_violation_counter, > > req- > > > pkey_violation_counter); > > > > + if (req->field_select & MLX5_HCA_VPORT_SEL_STATE_POLICY) > > + MLX5_SET(hca_vport_context, ctx, > > vport_state_policy, > > + req->policy); > > + if (req->field_select & MLX5_HCA_VPORT_SEL_PORT_GUID) > > + MLX5_SET64(hca_vport_context, ctx, port_guid, req- > > > port_guid); > > > > + if (req->field_select & MLX5_HCA_VPORT_SEL_NODE_GUID) > > + MLX5_SET64(hca_vport_context, ctx, node_guid, req- > > > node_guid); > > > > err = mlx5_cmd_exec(dev, in, in_sz, out, sizeof(out)); > > ex: > > kfree(in); > > > > > > Hi Max Re: " So I'm not sure that this commit is broken. .. .. I added prints to "mlx5_core_modify_hca_vport_context" function and found that we don't call it in "pure" mlx5 mode with PFs. Maybe you can try it too... " The thing is without this commit we connect immediately, no delay no issue and I am changing nothing else other than reverting here. So this clearly has a bearing directly on the functionality. I will look at adding more debug, but with this commit in there is nor evidence even of an attempt to connect and fail. Its silently faling. Regards Laurence