On Wed, Jan 24, 2018 at 11:08 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Wed, Jan 24, 2018 at 10:22:53AM -0600, Rohit Zambre wrote: > >> (1) First, is this a surprising result or is the 2x difference >> actually expected behavior? > > Maybe, there are lots of locks in one process, for instance glibc's > malloc has locking - so any memory allocation anywhere in the > applications processing path will cause lock contention. The issue may > have nothing to do with RDMA. There are no mallocs in the critical path of the benchmark. In the 1 process multi-threaded case, the mallocs for resource creation are all before creating the OpenMP parallel region. Here's a snapshot of the parallel region that contains the critical path: #pragma omp parallel { int i = omp_get_thread_num(), k; int cqe_count = 0; int post_count = 0; int comp_count = 0; int posts = 0; struct ibv_send_wr *bad_send_wqe; struct ibv_wc *WC = (struct ibv_wc*) malloc(qp_depth * sizeof(struct ibv_wc) ); // qp_depth is 128 (adopted from perftest) #pragma omp single { // only one thread will execute this MPI_Barrier(MPI_COMM_WORLD); } // implicit barrier for the threads if (i == 0) t_start = MPI_Wtime(); /* Critical Path Start */ while (post_count < posts_per_qp || comp_count < posts_per_qp) { // posts_per_qp = num_of_msgs / num_qps /* Post */ posts = min( (posts_per_qp - post_count), (qp_depth - (post_count - comp_count) ) ); for (k = 0; k < posts; k++) ret = ibv_post_send(qp[i], &send_wqe[i], &bad_send_wqe); post_count += posts; /* Poll */ if (comp_count < posts_per_qp) { cqe_count = ibv_poll_cq(cq[i], num_comps, WC); // num_comps = qp_depth comp_count += cqe_count; } } /* Critical Path End */ if (i == 0) t_end = MPI_Wtime(); } > There is also some locking inside the userspace mlx5 driver that may > contend depending on how your process has set things up. I missed mentioning this but I collected the numbers with MLX5_SINGLE_THREADED set since none of the resources were being shared between the threads. So, the userspace driver wasn't taking any locks. > The entire send path is in user space so there is no kernel component > here. Yes, that's correct. My concern was that during resource creation, the kernel was maybe sharing some resource for a process or that some sort of multiplexing was occurring to hardware contexts through control groups. Is it safe for me to conclude that separate, independent contexts/bfregs are being assigned when a process calls ibv_open_device multiple times? > Jason Thanks, Rohit Zambre -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html