Re: Is it OK to pass non-acquired objects to kfree?

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Thu, 10 Sep 2015 15:17:25 +0200

On Thu, Sep 10, 2015 at 2:47 PM, Vlastimil Babka <vbabka@xxxxxxx> wrote:
> On 09/10/2015 11:55 AM, Dmitry Vyukov wrote:
>> On Thu, Sep 10, 2015 at 1:31 AM, Christoph Lameter <cl@xxxxxxxxx> wrote:
>>> On Wed, 9 Sep 2015, Paul E. McKenney wrote:
>>>
>>>> Either way, Dmitry's tool got a hit on real code using the slab
>>>> allocators.  If that hit is a false positive, then clearly Dmitry
>>>> needs to fix his tool, however, I am not (yet) convinced that it is a
>>>> false positive.  If it is not a false positive, we might well need to
>>>> articulate the rules for use of the slab allocators.
>>>
>>> Could I get a clear definiton as to what exactly is positive? Was this
>>> using SLAB, SLUB or SLOB?
>>>
>>>> > This would all use per cpu data. As soon as a handoff is required within
>>>> > the allocators locks are being used. So I would say no.
>>>>
>>>> As in "no, it is not necessary for the caller of kfree() to invoke barrier()
>>>> in this example", right?
>>>
>>> Actually SLUB contains a barrier already in kfree(). Has to be there
>>> because of the way the per cpu pointer is being handled.
>>
>> The positive was reporting of data races in the following code:
>>
>> // kernel/pid.c
>>          if ((atomic_read(&pid->count) == 1) ||
>>               atomic_dec_and_test(&pid->count)) {
>>                  kmem_cache_free(ns->pid_cachep, pid);
>>                  put_pid_ns(ns);
>>          }
>>
>> //drivers/tty/tty_buffer.c
>> while ((next = buf->head->next) != NULL) {
>>      tty_buffer_free(port, buf->head);
>>      buf->head = next;
>> }
>>
>> Namely, the tool reported data races between usage of the object in
>> other threads before they released the object and kfree.
>>
>
> [...]
>
>> There are memory allocator implementations that do reads and writes of
>> the object, and there are memory allocator implementations that do not
>> do any barriers on fast paths. From this follows that objects must be
>> passed in quiescent state to kfree.
>> Now, kernel memory model says "A load-load control dependency requires
>> a full read memory barrier".
>
> But a load-load dependency is something different than writes from
> kmem_cache_free() being visible before the atomic_read(), right?

Right.
As of now, the code has problem with both reads and writes from kfree
hoisting above the pid->count==1 check.
I've just demonstrated the problem for reads.

> So the problem you are seeing is a different one, that some other cpu's are
> still writing to the object after they decrese the count to 1?.

I don't see any actual manifestation of the issue.  Our tool works on
x86, so chances of actual manifestation are pretty low. But the tool
verifies code according to abstract memory model and thus can detect
potential manifestations on ARM, POWER, Alpha or whatever.

>> From this follows that the following code is broken:
>>
>> // kernel/pid.c
>>          if ((atomic_read(&pid->count) == 1) ||
>>               atomic_dec_and_test(&pid->count)) {
>>                  kmem_cache_free(ns->pid_cachep, pid);
>>                  put_pid_ns(ns);
>>          }
>>
>> and it should be:
>>
>> // kernel/pid.c
>>          if ((smp_load_acquire(&pid->count) == 1) ||
>
> Is that enough? Doesn't it need a pairing smp_store_release?
>
>>               atomic_dec_and_test(&pid->count)) {
>
> A prior release from another thread (that sets the counter to 1) would be done
> by this atomic_dec_and_test() (this all is put_pid() function).
> Does that act as a release? memory-barriers.txt seems to say it does.

Yes, release is required and it is provided by atomic_dec_and_test.

> So yeah your patch seems to be needed and I don't think it should be the sl*b
> providing the necessary barrier here. It should be on the refcounting IMHO. That
> has the knowledge of correct ordering depending on the pid->count, sl*b has no
> such knowledge.

Thanks for confirmation!
Yes, I completely agree that code that calls kfree must provide
necessary synchronization. It would just massive pessimization if
kfree do barrier.

>>                  kmem_cache_free(ns->pid_cachep, pid);
>>                  put_pid_ns(ns);
>>          }

-- 
Dmitry Vyukov, Software Engineer, dvyukov@xxxxxxxxxx
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href