race condition issue at remote proc startup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

We (at Kalray) have some difficulties during initialization of a remoteproc device, and there seem to have no clean way (at least not one we know of) out of this problem.

We need vring defined in the resource table to be completely initialized before the remoteproc device is started. By completely initialized I mean that the vring device address defined in resource table shall be changed from 0xff..ff to a proper address. Currently the remote device is started before the initialization has completed, which creates a race condition between Linux and the remoteproc device. (We have a particular architecture in which the processor running Linux is the same as the embedded processor, this is why this problem happens in our case but probably not when the processor running Linux is much faster than the embedded processor).

Our best attempt up to now is to configure the virtio ring sooner i.e during subdevice preparation instead of subdevice start.
i.e. in rproc_handle_vdev change code from
    rvdev->subdev.start = rproc_vdev_do_start;
to
    /* da field in vring must be initialized before powering up
     * the remoterproc, or else race condition may occur.
     * Indeed the remoteproc may read it before it has been initialized.
     */
    rvdev->subdev.prepare = rproc_vdev_do_start;

This works but it has undesired side effects. In particular some notifications are sent (the remote proc kick function is being called), but since the remote CPU has not been started yet we are not able to handle them, thus we simply ignore them if the state of the remote proc is not RUNNING. At least this seems to solve our problem, but this is a particularly unpleasant way of solving the problem, in particular it might impact the existing remoteproc devices. Do you have any suggestion on some cleaner to way to solve this problem?

FYI, here is our arch specific remote proc implementation: https://github.com/kalray/linux_coolidge/blob/coolidge/drivers/remoteproc/kvx_remoteproc.c

PS: there seem to be a similar problem when the remote device is being stopped. The vring buffer are destroyed and only after is the remote proc device stopped. There is once again a race condition as the remote proc device might try to access the vring after their destruction by the host. Proposed change is as follow:
In rproc_handle_vdev change code from
    rvdev->subdev.stop = rproc_vdev_do_stop;
to
    rvdev->subdev.unprepare = rproc_vdev_do_stop;

Note this change has much less impact on existing remote proc and is symmetric to the previous change thus it might make it sound more logical

PS2: I guess that this issue never showed up before because most other use cases are using fixed addresses in the resource tables and not dynamically allocated ones at runtime.

Regards,

--
Yann





[Index of Archives]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Photo Sharing]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux