Re: Segfault in mlx5 driver on infiniband after application fork

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 
> Newer kernels are detected and disable the DONT_FORK calls in verbs.
> 
> rdma-core support is present since:
> 
> commit 67b00c3835a3480a035a9e1bcf5695f5c0e8568e
> Author: Gal Pressman <galpress@xxxxxxxxxx>
> Date:   Sun Apr 4 17:24:54 2021 +0300
> 
>    verbs: Report when ibv_fork_init() is not needed
> 
>    Identify kernels which do not require ibv_fork_init() to be called and
>    report it through the ibv_is_fork_initialized() verb.
> 
>    The feature detection is done through a new read-only attribute in the
>    get sys netlink command. If the attribute is not reported, assume old
>    kernel without COF support. If the attribute is reported, use the
>    returned value.
> 
>    This allows ibv_is_fork_initialized() to return the previously unused
>    IBV_FORK_UNNEEDED value, which takes precedence over the
>    DISABLED/ENABLED values. Meaning that if the kernel does not require a
>    call to ibv_fork_init(), IBV_FORK_UNNEEDED will be returned regardless
>    of whether ibv_fork_init() was called or not.
> 
>    Signed-off-by: Gal Pressman <galpress@xxxxxxxxxx>
> 
> The kernel support was in v5.13-rc1~78^2~1
> 
> And backported in a few cases.

To work around this, I had to use gdb on my benchmark to set a breakpoint in ibv_fork_init() in order to track down all the callers of that function, which turned out to be both UCX and Libfabric.  I then had to download source repos, examine the code, and for each repo determine what environment variable controls the calls to ibv_fork_init().  For Libfabric I had to ensure that RDMA_FORK_SAFE and IBV_FORK_SAFE were not set, which my team members routinely use.  For UCX I had to set UCX_IB_FORK_INIT=no, otherwise by default UCX always calls ibv_fork_init.   With UCX_IB_FORK_INIT set to no, scary error messages about registered memory corruption print to stderr whenever there is a fork, even though that’s not true any more with up-to-date kernels.   Folks that don’t know the details of ibv_fork_init() behavior are going to be reluctant to set UCX_IB_FORK_INIT=no.

If ibv_fork_init() would check the kernel and just return without initializing mm_root when the kernel has enhanced fork support, then all the environment variable hassles go away, the environment variable settings don’t matter, ibv_fork_init() will always do the right thing.  This seems like a big win to me, am I missing some downside perhaps?

Thanks, Kevan









[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux