Il 28/04/22 19:31, Mathieu Poirier ha scritto:
On Fri, Jan 14, 2022 at 03:47:37PM +0100, AngeloGioacchino Del Regno wrote:
During execution of the worker that's used to register rpmsg devices
we are safely locking the channels mutex but, when creating a new
endpoint for such devices, we are registering a IPI on the SCP, which
then makes the SCP to trigger an interrupt, lock its own mutex and in
turn register more subdevices.
This creates a circular locking dependency situation, as the mtk_rpmsg
channels_lock will then depend on the SCP IPI lock.
[ 18.014514] Possible unsafe locking scenario:
[ 18.014515] CPU0 CPU1
[ 18.014517] ---- ----
[ 18.045467] lock(&mtk_subdev->channels_lock);
[ 18.045474] lock(&scp->ipi_desc[i].lock);
[ 18.228399] lock(&mtk_subdev->channels_lock);
[ 18.228405] lock(&scp->ipi_desc[i].lock);
[ 18.264405]
I finally understand the problem, something that would have been impossible
without the pastebin you provided in your latest email. Please add the content
of that pastebin to the changelog and send another revision (checkpatch
warnings can be ignored).
Thanks for giving it another look... I'll add a cover letter with the contents
of the pastebin to avoid possible confusion for anyone looking at the patch.
To solve this, simply unlock the channels_lock mutex before calling
mtk_rpmsg_register_device() and relock it right after, as safety is
still ensured by the locking mechanism that happens right after
through SCP.
The integrity of the subdev->channels list is guaranteed by relocking the
mutex, I'm not sure what "through SCP" adds to the sentence.
I'll clarify the commit message.
Notably, mtk_rpmsg_register_device() does not even require locking.
I don't agree with the above sentence - if locking doesn't happen in
mtk_rpmsg_create_device(), there can be two CPUs accessing the list at the same
time.
That's right, I have largely underestimated that for some reason, sorry about that.
Regards,
Angelo
Thanks,
Mathieu