Re: [PATCH] dma-fence: fix dma_fence_get_rcu_safe

Christian König <christian.koenig@xxxxxxx> · Mon, 11 Sep 2017 11:57:57 +0200

Am 11.09.2017 um 11:23 schrieb Chris Wilson:
Quoting Christian König (2017-09-11 10:06:50)
Am 11.09.2017 um 10:59 schrieb Chris Wilson:
Quoting Christian König (2017-09-11 09:50:40)
Sorry for the delayed response, but your mail somehow ended up in the
Spam folder.

Am 04.09.2017 um 15:40 schrieb Chris Wilson:
Quoting Christian König (2017-09-04 14:27:33)
From: Christian König <christian.koenig@xxxxxxx>

The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to
acquire a reference it doesn't necessary mean that there is no fence at all.

It usually mean that the fence was replaced by a new one and in this situation
we certainly want to have the new one as result and *NOT* NULL.
Which is not guaranteed by the code you wrote either.

The point of the comment is that the mb is only inside the successful
kref_atomic_inc_unless_zero, and that only after that mb do you know
whether or not you have the current fence.

You can argue that you want to replace the
        if (!dma_fence_get_rcu())
                return NULL
with
        if (!dma_fence_get_rcu()
                continue;
but it would be incorrect to say that by simply ignoring the
post-condition check that you do have the right fence.
You are completely missing the point here.

It is irrelevant if you have the current fence or not when you return.
You can only guarantee that it is the current fence when you take a look
and that is exactly what we want to avoid.

So the existing code is complete nonsense. Instead what we need to
guarantee is that we return *ANY* fence which we can grab a reference for.
Not quite. We can grab a reference on a fence that was already freed and
reused between the rcu_dereference() and dma_fence_get_rcu().
Reusing a memory structure before the RCU grace period is completed is
illegal, otherwise the whole RCU approach won't work.
RCU only protects that the pointer remains valid. If you use
SLAB_TYPESAFE_BY_RCU, it is possible to reuse the pointer within a grace
period. It does happen and that is the point the comment is trying to
make.

Yeah, but that is illegal with a fence objects.

When anybody allocates fences this way it breaks at least 
reservation_object_get_fences_rcu(), 
reservation_object_wait_timeout_rcu() and 
reservation_object_test_signaled_single().

Cause all of them rely on dma_fence_get() to return NULL when the fence 
isn't valid any more to restart the operation.

When dma_fence_get_rcu() returns a reallocated fence the operation 
wouldn't correctly restart and the end result most likely not be correct 
at all.

Using SLAB_TYPESAFE_BY_RCU is only valid if you can ensure that you have 
the right object using a second criteria and that is not the case with 
fences.

Regards,
Christian.