Re: commit ab118da4c10a70b8437f5c90ab77adae1835963e causes ib_srpt to fail connections served by target LIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2020-03-15 at 18:47 +0200, Max Gurtovoy wrote:
> On 3/14/2020 11:30 PM, Laurence Oberman wrote:
> > Hello Bart, Leon and Max
> 
> Hi Laurence,
> 
> thanks for the great analysis and the fast response !!
> 
> > 
> > Max had reached out to me to test a new set of patches for SRQ.
> > I had not tested upstream ib_srpt on an LIO target for quite a
> > while,
> > only ib_srp client tests had been run of late.
> > During a baseline test before applying Max's patches it was
> > apparent
> > that something had broken ib_srpt connections within LIO target
> > since
> > 5.5.
> > 
> > Note thet ib_srp client connectivity with the commit functions
> > fine,
> > it's just the target that breaks with this commit.
> > 
> > After a long bisect this is the commit that seems to break it.
> > While it's not directly code in ib_srpt, its code in mlx5 vport
> > ethernet connectivity that then breaks ib_srpt connectivity over
> > mlx5
> > IB RDMA with LIO.
> 
> I was able to connect in loopback and also from remote initiator
> with 
> this commit.
> 
> So I'm not sure that this commit is broken.
> 
> I used Bart's scripts to configure the target and to connect to it
> in 
> loopback (after some modifications for the updated
> kernel/sysfs/configfs 
> interface).
> 
> I did see an issue to connect from remote initiator, but after
> reloading 
> openibd in the initiator side I was able to connect.
> 
> So I suspect you had the same issue - that also should be debugged.
> 
> > 
> > I will let Leon and others decide but reverting the below commit
> > allows
> > SRP connectivity to an LIO target to work again.
> 
> I added prints to "mlx5_core_modify_hca_vport_context" function and 
> found that we don't call it in "pure" mlx5 mode with PFs.
> 
> Maybe you can try it too...
> 
> I was able to check my patches on my system and I'll send them soon.
> 
> Thanks again Laurence and Bart.
> 
> > 
> > Max, I will test your new patches once we have a decision on this.
> > 
> > Client
> > Linux ibclient.lab.eng.bos.redhat.com 5.6.0-rc5+ #1 SMP Thu Mar 12
> > 16:58:19 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > Server with reverted commit
> > Linux fedstorage.bos.redhat.com 5.6.0-rc5+ #1 SMP Sat Mar 14
> > 16:39:35
> > EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > commit ab118da4c10a70b8437f5c90ab77adae1835963e
> > Author: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > Date:   Wed Nov 13 12:03:47 2019 +0200
> > 
> >      net/mlx5: Don't write read-only fields in
> > MODIFY_HCA_VPORT_CONTEXT
> > command
> >      
> >      The MODIFY_HCA_VPORT_CONTEXT uses field_selector to mask
> > fields
> > needed
> >      to be written, other fields are required to be zero according
> > to
> > the
> >      HW specification. The supported fields are controlled by
> > bitfield
> >      and limited to vport state, node and port GUIDs.
> >      
> >      Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> >      Signed-off-by: Saeed Mahameed <saeedm@xxxxxxxxxxxx>
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
> > b/drivers/net/ethernet/mellanox/mlx5
> > index 30f7848..1faac31f 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
> > @@ -1064,26 +1064,13 @@ int
> > mlx5_core_modify_hca_vport_context(struct
> > mlx5_core_dev *dev,
> >   
> >          ctx = MLX5_ADDR_OF(modify_hca_vport_context_in, in,
> > hca_vport_context);
> >          MLX5_SET(hca_vport_context, ctx, field_select, req-
> > > field_select);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, sm_virt_aware, req-
> > > sm_virt_aware);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, has_smi, req->has_smi);
> > -       MLX5_SET(hca_vport_context, ctx, has_raw, req->has_raw);
> > -       MLX5_SET(hca_vport_context, ctx, vport_state_policy, req-
> > > policy);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, port_physical_state, req-
> > > phys_state);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, vport_state, req-
> > > vport_state);
> > 
> > -       MLX5_SET64(hca_vport_context, ctx, port_guid, req-
> > >port_guid);
> > -       MLX5_SET64(hca_vport_context, ctx, node_guid, req-
> > >node_guid);
> > -       MLX5_SET(hca_vport_context, ctx, cap_mask1, req-
> > >cap_mask1);
> > -       MLX5_SET(hca_vport_context, ctx, cap_mask1_field_select,
> > req-
> > > cap_mask1_perm);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, cap_mask2, req-
> > >cap_mask2);
> > -       MLX5_SET(hca_vport_context, ctx, cap_mask2_field_select,
> > req-
> > > cap_mask2_perm);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, lid, req->lid);
> > -       MLX5_SET(hca_vport_context, ctx, init_type_reply, req-
> > > init_type_reply);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, lmc, req->lmc);
> > -       MLX5_SET(hca_vport_context, ctx, subnet_timeout, req-
> > > subnet_timeout);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, sm_lid, req->sm_lid);
> > -       MLX5_SET(hca_vport_context, ctx, sm_sl, req->sm_sl);
> > -       MLX5_SET(hca_vport_context, ctx, qkey_violation_counter,
> > req-
> > > qkey_violation_counter);
> > 
> > -       MLX5_SET(hca_vport_context, ctx, pkey_violation_counter,
> > req-
> > > pkey_violation_counter);
> > 
> > +       if (req->field_select & MLX5_HCA_VPORT_SEL_STATE_POLICY)
> > +               MLX5_SET(hca_vport_context, ctx,
> > vport_state_policy,
> > +                        req->policy);
> > +       if (req->field_select & MLX5_HCA_VPORT_SEL_PORT_GUID)
> > +               MLX5_SET64(hca_vport_context, ctx, port_guid, req-
> > > port_guid);
> > 
> > +       if (req->field_select & MLX5_HCA_VPORT_SEL_NODE_GUID)
> > +               MLX5_SET64(hca_vport_context, ctx, node_guid, req-
> > > node_guid);
> > 
> >          err = mlx5_cmd_exec(dev, in, in_sz, out, sizeof(out));
> >   ex:
> >          kfree(in);
> >   
> > 
> 
> 


Hi Max
Re:

"
So I'm not sure that this commit is broken.
..
..
I added prints to "mlx5_core_modify_hca_vport_context" function and
found that we don't call it in "pure" mlx5 mode with PFs.

Maybe you can try it too...
"

The thing is without this commit we connect immediately, no delay no
issue and I am changing nothing else other than reverting here.

So this clearly has a bearing directly on the functionality.

I will look at adding more debug, but with this commit in there is nor
evidence even of an attempt to connect and fail.

Its silently faling.

Regards
Laurence







[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux