Re: Segfault in mlx5 driver on infiniband after application fork

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 11, 2024 at 02:24:16PM -0500, Kevan Rehm wrote:
> 
> >> An application started by pytorch does a fork, then the child
> >> process attempts to use libfabric to open a new DAOS infiniband
> >> endpoint.  The original endpoint is owned and still in use by the
> >> parent process.
> >>
> >> When the parent process created the endpoint (fi_fabric,
> >> fi_domain, fi_endpoint calls), the mlx5 driver allocated memory
> >> pages for use in SRQ creation, and issued a madvise to say that
> >> the pages are DONTFORK.  These pages are associated with the
> >> domain’sibv_device which is cached in the driver.  After the fork
> >> when the child process calls fi_domain for its new endpoint, it
> >> gets the ibv_device that was cached at the time it was created by
> >> the parent.  The child process immediately segfaults when trying
> >> to create a SRQ, because the pages associated with that
> >> ibv_device are not in the child’s memory.  There doesn’t appear
> >> to be any way for a child process to create a fresh endpoint
> >> because of the caching being done for ibv_devices.
> 
> > For anyone who is interested in this issue, please follow the links below:
> > https://github.com/ofiwg/libfabric/issues/9792
> > https://daosio.atlassian.net/browse/DAOS-15117
> > 
> > Regarding the issue, I don't know if mlx5 actively used to run
> > libfabric, but the mentioned call to ibv_dontfork_range() existed from
> > prehistoric era.
> 
> Yes, libfabric has used mlx5 for a long time.
> 
> > Do you have any environment variables set related to rdma-core?
> > 
> IBV_FORK_SAFE is set to 1
> 
> > Is it reated to ibv_fork_init()? It must be called when fork() is called.
> 
> Calling ibv_fork_init() doesn’t help, because it immediately checks mm_root, sees it is non-zero (from the parent process’s prior call), and returns doing nothing.
> There is now a simplified test case, see https://github.com/ofiwg/libfabric/issues/9792 for ongoing analysis.

This was all fixed in the kernel, upgrade your kernel and forking
works much more reliably, but I'm not sure this case will work.

It is a libfabric problem if it is expecting memory to be registers
for RDMA and be used by both processes in a fork. That cannot work.

Don't do that, or make the memory MAP_SHARED so that the fork children
can access it.

The bugs seem a bit confused, there is no issue with ibv_device
sharing. Only with actually sharing underlying registered memory. Ie
sharing a SRQ memory pool between the child and parent.

"fork safe" does not magically make all scenarios work, it is
targetted at a specific use case where a rdma using process forks and
the fork does not continue to use rdma.

Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux