> On Feb 12, 2024, at 9:40 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Mon, Feb 12, 2024 at 09:37:25AM -0500, Kevan Rehm wrote: > >>> This was all fixed in the kernel, upgrade your kernel and forking >>> works much more reliably, but I'm not sure this case will work. >> >> I agree, that won’t help here. >> >>> It is a libfabric problem if it is expecting memory to be registers >>> for RDMA and be used by both processes in a fork. That cannot work. >>> >>> Don't do that, or make the memory MAP_SHARED so that the fork children >>> can access it. >> >> Libfabric agrees, it wants to use separate registered memory in the >> child, but there doesn’t seem to be a way to do this. > > How can that be true? libfabric is the only entity that causes memory > to be registered :) > >>> The bugs seem a bit confused, there is no issue with ibv_device >>> sharing. Only with actually sharing underlying registered memory. Ie >>> sharing a SRQ memory pool between the child and parent. >> >> Libfabric calls rdma_get_devices(), then walks the list looking for >> the entry for the correct domain (mlx5_1). It saves a pointer to >> the matching dev_list entry which is an ibv_context structure. >> Wrapped on that ibv_context is the mlx5 context which contains the >> registered pages that had dontfork set when the parent established > ^^^^^^^^^^^^^^^^ > > It does not. context don't have pages, your problem comes from > something else. My terminology may be incorrect, certainly my knowledge is limited. See routine __add_page() in providers/mlx5/dbrec.c. It calls either mlx5_alloc_buf() or mlx5_alloc_buf_extern() to allocate a page. Those routines call ibv_dontfork_range on the page after it’s been allocated via posix_memalign(). _add_page() then adds the new page to the mlx5_context field dbr_available_pages. Later the function mlx5_create_srq() calls mlx5_alloc_dbrec() to allocate space out of the page, it returns a __be32 which is stored in srq->db by mlx5_create_srq(). The routine then calls "*srq->db = 0” to initialize the space. When the parent process calls mlx5_create_srq() to create a SRQ, a page gets allocated and dontfork is set. After the fork, the child process calls rdma_get_devices() which returns the parent's ibv_context, which contains the above-mentioned mlx5_context. When the child calls mlx5_create_srq(), the “srq->db = 0” statement segfaults because the space is allocated out of the same page that was allocated by the parent and is not in the child’s memory. > > Jason