On Tue, Nov 28, 2017 at 02:00:12PM -0700, Jason Gunthorpe wrote: > On Tue, Nov 28, 2017 at 09:03:46PM +0200, Yuval Shaia wrote: > > > I agree that patch as it is now does not really handle the case where one > > port fails so it needs to be fixed. > > > > The thing is that from your perspective the idea itself is wrong, i.e. if > > one (of for example two ports) fails the driver needs to continue and serve > > the other port and just print error message. > > On this point, I think if ports are completely independent at the ipoib > layer then they should not become linked during the add process. > > ie if a port is working and a second port fails then it should not > kill the first port. > > However, it is unfortunate we have no recovery from this case at all. > > Alex V: However, why is the current behavior a problem? Is this > because of a dual port card with IB and ROCE concurrently? And the > add 'fails' the ROCE port even though it isn't even really a failure? > We certainly shouldn't print in that case.. It is a problem for one port cards too, i see such print on my system: root@mtr-leonro:~# dmesg |grep Fail [ 7.785329] Failed to init port, removing it root@mtr-leonro:~# /mnt/iproute2/rdma/rdma link 1/1: mlx5_0/1: subnet_prefix fe80:0000:0000:0000 lid 13399 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP 2/1: mlx5_1/1: subnet_prefix fe80:0000:0000:0000 lid 13400 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP 3/1: mlx5_2/1: subnet_prefix fe80:0000:0000:0000 lid 13401 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP 4/1: mlx5_3/1: state DOWN physical_state DISABLED 5/1: mlx5_4/1: subnet_prefix fe80:0000:0000:0000 lid 13403 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP Thanks > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature