> we got a bug report from a customer that FhGFS server daemons frequently > crash. Looking into it, we are almost sure it is related to > librdmacm/kernel-ib. So the crash in librdmacm resolves to: > > > (gdb) l *(rdma_get_cm_event+0x102) > > 0x3d6dc04522 is in rdma_get_cm_event (src/cma.c:1975). > > 1970 break; > > 1971 default: > > 1972 evt->id_priv = (void *) (uintptr_t) resp.uid; > > 1973 evt->event.id = &evt->id_priv->id; > > 1974 evt->event.status = resp.status; > > 1975 if (ucma_is_ud_qp(evt->id_priv->id.qp_type)) > > 1976 ucma_copy_ud_event(evt, &resp.param.ud); > > 1977 else > > 1978 ucma_copy_conn_event(evt, &resp.param.conn); > > 1979 break; > > > Now the complete shows that the issue comes up directly after initiating > a connection and we are just querying the file descriptor for events. So > from our point of view there is not much we can do about it. > > Possibly this is already fixed by commit > 418edaaba96e58112b15c82b4907084e2a9caf42 and this commit is also not yet > in the latest RHEL kernel being used on the customer system. > However, the commit messages states it is for RDMA_CM_EVENT_ESTABLISHED > only, while the resolved address above points to another event. Does the > commit really on fix this event or would it also fix further events? The above commit fixes an issue where the uid can be returned 0 from the kernel. A call to resolve an address should not have this issue. The issue is limited to connections formed from a listen request. The reason for this is that passive side connections create the kernel id first, which is then linked up with the user space id. Active side processing creates the user space id, then the kernel id. > While looking where file->mut is being set, I notice that in > ucma_create_id() the call of ucma_alloc_ctx() is protected by a mutex, > but the mutex is immediately given up after that call. In > ucma_alloc_ctx() the ctx is already added to file->ctx_list, but the > ctx->uid and ctx->cm_id are then modified outside a mutex in > ucma_create_id(). Shouldn't the mutex_unlock() be done after assigning > those values? It shouldn't matter, as long as the uid and cm_id are set before any events can be generated. Events are not generated on newly created id's. - Sean ��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f