Re: Segfault in mlx5 driver on infiniband after application fork

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Feb 12, 2024, at 9:40 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> 
> On Mon, Feb 12, 2024 at 09:37:25AM -0500, Kevan Rehm wrote:
> 
>>> This was all fixed in the kernel, upgrade your kernel and forking
>>> works much more reliably, but I'm not sure this case will work.
>> 
>> I agree, that won’t help here.
>> 
>>> It is a libfabric problem if it is expecting memory to be registers
>>> for RDMA and be used by both processes in a fork. That cannot work.
>>> 
>>> Don't do that, or make the memory MAP_SHARED so that the fork children
>>> can access it.
>> 
>> Libfabric agrees, it wants to use separate registered memory in the
>> child, but there doesn’t seem to be a way to do this.
> 
> How can that be true? libfabric is the only entity that causes memory
> to be registered :)
> 
>>> The bugs seem a bit confused, there is no issue with ibv_device
>>> sharing. Only with actually sharing underlying registered memory. Ie
>>> sharing a SRQ memory pool between the child and parent.
>> 
>> Libfabric calls rdma_get_devices(), then walks the list looking for
>> the entry for the correct domain (mlx5_1).  It saves a pointer to
>> the matching dev_list entry which is an ibv_context structure.
>> Wrapped on that ibv_context is the mlx5 context which contains the
>> registered pages that had dontfork set when the parent established
>  ^^^^^^^^^^^^^^^^
> 
> It does not. context don't have pages, your problem comes from
> something else.

My terminology may be incorrect, certainly my knowledge is limited.  

See routine __add_page() in providers/mlx5/dbrec.c.  It calls either mlx5_alloc_buf() or mlx5_alloc_buf_extern() to allocate a page.  Those routines call ibv_dontfork_range on the page after it’s been allocated via posix_memalign().   _add_page() then adds the new page to the mlx5_context field dbr_available_pages.  Later the function mlx5_create_srq() calls mlx5_alloc_dbrec() to allocate space out of the page, it returns a __be32 which is stored in srq->db by mlx5_create_srq().  The routine then calls "*srq->db = 0” to initialize the space.

When the parent process calls mlx5_create_srq() to create a SRQ, a page gets allocated and dontfork is set.  After the fork, the child process calls rdma_get_devices() which returns the parent's ibv_context, which contains the above-mentioned mlx5_context.  When the child calls mlx5_create_srq(), the “srq->db = 0” statement segfaults because the space is allocated out of the same page that was allocated by the parent and is not in the child’s memory.
> 
> Jason






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux