Fw: Announcing IBM Platform MPI 9.1.2.1 FixPack

David Solt <dsolt@xxxxxxxxxx> · Tue, 6 May 2014 15:25:57 -0500

Resending as plain text for linux-rdma's sake:

From:   David Solt/Dallas/IBM
To:     sean.hefty@xxxxxxxxx, 
Cc:     Geoffrey Paulsen/Dallas/IBM@IBMUS, linux-rdma@xxxxxxxxxxxxxxx
Date:   05/02/2014 09:33 AM
Subject:        Re: Fw: Announcing IBM Platform MPI 9.1.2.1 FixPack

Hi Sean,

I am trying to add rdmacm support to Platform MPI.   I noticed that the 
performance on our test cluster was very poor for creating connections. 
For 12 processes on 12 hosts to create n^^2 connections takes about 12 
seconds.   I also discovered that if I create some TCP sockets and use 
those to ensure that only one process at a time is calling rdmacm_connect 
to any target at a time, that the performance changes dramatically and 
that I can then connected the 12 processes very quickly (didn't measure 
exactly, but similar to our old rdma code).    The order in which I am 
connecting processes avoids flooding a single target with many 
rdmacm_connects at once, but it is difficult to avoid the case where 2 
processes call dmacm_connect to the same target at roughly the same time 
except when using my extra TCP socket connections.   I haven't played with 
MPICH code yet to see if they have the same issue, but will try that next. 

Our test cluster is a bit old:

09:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 
5GT/s - IB QDR / 10GigE] (rev b0) 

Is this a known problem?  Are you aware of any issues that would shed some 
light on this? 

Thanks,
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html