On Mon, 11 Jun 2018 10:19:18 -0600 Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote: > On Mon, Jun 11, 2018 at 09:19:14AM +0300, jackm wrote: > > On Sun, 10 Jun 2018 22:42:03 -0600 > > Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote: > > > > > Er, the spec has nothing to do with this. In Linux the TID is made > > > unique because the core code provides 32 bits that are unique and > > > the user provides another 32 bits that are unique. The driver > > > cannot change any of those bits without risking non-uniquenes, > > > which is exactly the bug mlx4 created when it stepped outside its > > > bounds and improperly overrode bits in the TID for its own > > > internal use. > > > > Actually, the opposite is true here. When SRIOV is active, each VM > > generates its *own* TIDs -- with 32 bits of agent number and 32 bits > > of counter. > > And it does it while re-using the LRH of the host, so all VMs and the > host are now forced to share a TID space, yes I know. > > > There is a chance that two different VMs can generate the same TID! > > Encoding the slave (VM) number in the packet actually guarantees > > uniqueness here. > > Virtualizing the TID in the driver would be fine, but it must > virtualize all the TIDs (even those generated by the HOST). It DOES do so. The host slave id is 0. Slave numbers start with 1. If the MS byte contains a zero after paravirtualization, the MAD was sent by the host. In fact, ALL mads are paravirtualized -- including those to/from the host. > > Just blindly assuming the host doesn't generate TID's that overlap > with the virtualization process is a bug. > Not the case, host mads are also paravirtualized. > > There is nothing wrong with modifying the TID in a reversible way in > > order to: a. guarantee uniqueness b. identify the VM which should > > receive the response packet > > Sure, as long as *all* TID's sharing a LRH are vitalized like this. > > > The problem was created when the agent-id numbers started to use the > > most-significant byte (thus making the MSB slave-id addition > > impossible). > > It hasn't always been this way? What commit? > Commit: 37bfc7c1e83f1 ("IB/mlx4: SR-IOV multiplex and demultiplex MADs"): Code snippet which replaces the MS byte is below (file drivers/infiniband/hw/mlx4/mad.c, procedure mlx4_ib_multiplex_mad() ). Just an aside: Oracle noted the problem on the *host*: The host's message log contained the warning issued on line 1529, with slave=0 (which is the hypervisor). 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1519) switch (tunnel->mad.mad_hdr.method) { 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1520) case IB_MGMT_METHOD_SET: 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1521) case IB_MGMT_METHOD_GET: 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1522) case IB_MGMT_METHOD_REPORT: 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1523) case IB_SA_METHOD_GET_TABLE: 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1524) case IB_SA_METHOD_DELETE: 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1525) case IB_SA_METHOD_GET_MULTI: 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1526) case IB_SA_METHOD_GET_TRACE_TBL: 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1527) slave_id = (u8 *) &tunnel->mad.mad_hdr.tid; 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1528) if (*slave_id) { 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1529) mlx4_ib_warn(ctx->ib_dev, "egress mad has non-null tid msb:%d " 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1530) "class:%d slave:%d\n", *slave_id, 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1531) tunnel->mad.mad_hdr.mgmt_class, slave); 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1532) return; 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1533) } else 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1534) *slave_id = slave; 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1535) default: 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1536) /* nothing */; 37bfc7c1e (Jack Morgenstein 2012-08-03 08:40:44 +0000 1537) } > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html