Re: Segfault in mlx5 driver on infiniband after application fork

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Newer kernels are detected and disable the DONT_FORK calls in verbs.
> 
> rdma-core support is present since:
> 
> commit 67b00c3835a3480a035a9e1bcf5695f5c0e8568e
> Author: Gal Pressman <galpress@xxxxxxxxxx>
> Date:   Sun Apr 4 17:24:54 2021 +0300
> 
>    verbs: Report when ibv_fork_init() is not needed
> 
>    Identify kernels which do not require ibv_fork_init() to be called and
>    report it through the ibv_is_fork_initialized() verb.
> 
>    The feature detection is done through a new read-only attribute in the
>    get sys netlink command. If the attribute is not reported, assume old
>    kernel without COF support. If the attribute is reported, use the
>    returned value.
> 
>    This allows ibv_is_fork_initialized() to return the previously unused
>    IBV_FORK_UNNEEDED value, which takes precedence over the
>    DISABLED/ENABLED values. Meaning that if the kernel does not require a
>    call to ibv_fork_init(), IBV_FORK_UNNEEDED will be returned regardless
>    of whether ibv_fork_init() was called or not.
> 
>    Signed-off-by: Gal Pressman <galpress@xxxxxxxxxx>
> 
> The kernel support was in v5.13-rc1~78^2~1
> 
> And backported in a few cases.
> 
> Jason

The above info was immensely helpful, and I am running MOFED 23.10-OFED.23.10.0.5.5.1 so my kernel already has the fork improvements.  However, there are still issues, as the above requires all callers to check ibv_is_fork_initialized() before every call to ibv_fork_init.  Not everyone does this.

Routine ibv_get_device() unconditionally calls ibverbs_init() on the first call, and that routine calls ibv_fork_init() if either RDMA_FORK_SAFE or IBV_FORK_SAFE are set, even if the kernel has the fork enhancements.  I wrapped that check with a call to ibv_is_fork_initialized, and skipped the ibv_fork_init() call if IBV_FORK_UNNEEDED was returned.  This caused my little test program to run successfully, but the original benchmark still bombed.

The benchmark uses MPI.  It turns out that mpi4py calls PMPI_Init() which eventually makes UCX calls, and routine uct_ib_md_open() in UCX calls ibv_fork_init() without first calling ibv_is_fork_initialized.  It’s looking at some md_config->fork_init variable, not checking the kernel support.    In order to cover all potential cases, I changed my rdma patch to instead call ibv_is_fork_initialized() inside ibv_fork_init() itself, and return 0 without creating mm_root if kernel support is there.   This causes MPI and the original benchmark to work.

Is this a reasonable fix that could be added to rdma?

[root@delphi-029 libibverbs]# diff -C 5 memory.c.orig memory.c
*** memory.c.orig 2024-02-13 09:45:28.078997178 -0600
--- memory.c 2024-02-13 09:27:46.901699958 -0600
***************
*** 140,149 ****
--- 140,152 ----
huge_page_enabled = 1;

if (mm_root)
return 0;

+ if (ibv_is_fork_initialized() == IBV_FORK_UNNEEDED)
+ return 0;
+
if (too_late)
return EINVAL;

fprintf(stderr, "ibv_fork_init creating mm_root\n");
page_size = sysconf(_SC_PAGESIZE);





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux