Re: [PATCH -v4 3/4] lib, Make gen_pool memory allocator lockless

Huang Ying <ying.huang@xxxxxxxxx> · Fri, 29 Apr 2011 13:24:10 +0800

On 04/29/2011 12:23 PM, Mathieu Desnoyers wrote:
> * Huang Ying (ying.huang@xxxxxxxxx) wrote:
>> On 04/29/2011 09:11 AM, Mathieu Desnoyers wrote:
>>> * Huang Ying (ying.huang@xxxxxxxxx) wrote:
>>>> Hi, Mathieu,
>>>>
>>>> Thanks for your comments.
>>>>
>>>> On 04/28/2011 10:37 PM, Mathieu Desnoyers wrote:
>>>>> * Huang Ying (ying.huang@xxxxxxxxx) wrote:
>>>> [snip]
>>>>>>
>>>>>> +/**
>>>>>> + * gen_pool_for_each_chunk - iterate over chunks of generic memory pool
>>>>>> + * @chunk:   the struct gen_pool_chunk * to use as a loop cursor
>>>>>> + * @pool:    the generic memory pool
>>>>>> + *
>>>>>> + * Not lockless, proper mutual exclusion is needed to use this macro
>>>>>> + * with other gen_pool function simultaneously.
>>>>>> + */
>>>>>> +#define gen_pool_for_each_chunk(chunk, pool)                 \
>>>>>> +     list_for_each_entry_rcu(chunk, &(pool)->chunks, next_chunk)
>>>>>
>>>>> Is it just me or this macro is never used ? Maybe you should consider
>>>>> removing it.
>>>>
>>>> This macro is not used in this patch.  But it is used in 4/4 of the
>>>> patchset to free the backing pages before destroy the pool.
>>>
>>> Depending on how frequently you want to use it, you might want to use
>>> list_for_each_entry_rcu directly rather than a macro wrapper. E.g.  for
>>> 2-3 uses, adding a macro just obfuscates the code IMHO (e.g. you don't
>>> know it iterates on a RCU list by looking at the caller code).
>>
>> Yes. gen_pool_for_each_chunk() is not a good wrapper.  I just don't want
>> to expose too much implementation details to users, after all, we are
>> working on library code.  Maybe something like below is better?
>>
>> void gen_pool_for_each_chunk(struct gen_pool *pool, void (*func)(struct
>> gen_pool *pool, struct gen_pool_chunk *chunk)) {
>> 	rcu_read_lock();
>> 	list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk)
>> 		func(pool, chunk);
>> 	rcu_read_unlock();
>> }
> 
> If it is expected to be exposed to other parts of the kernel, indeed we
> should not expect the caller to magically know they must hold the rcu
> read-side lock.

Yes.  So I want to help the users via acquiring/releasing the rcu
read-side lock by ourselves.

> I'm not sure whether this iterator is necessary though. Just a comment
> could suffice.

I try to hide some implementation details from the users here.  So that
we can change the implementation easier if necessary in the future.  I
will add comments to warn users that the callback function is executed
in rcu_read_lock environment.

>>>>
>>>> [snip]
>>>>>> @@ -108,43 +226,50 @@ EXPORT_SYMBOL(gen_pool_destroy);
>>>>>>   * @size: number of bytes to allocate from the pool
>>>>>>   *
>>>>>>   * Allocate the requested number of bytes from the specified pool.
>>>>>> - * Uses a first-fit algorithm.
>>>>>> + * Uses a first-fit algorithm. Can not be used in NMI handler on
>>>>>> + * architectures without NMI-safe cmpxchg implementation.
>>>>>>   */
>>>>>>  unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>>>>>>  {
>>>>>> -     struct list_head *_chunk;
>>>>>>       struct gen_pool_chunk *chunk;
>>>>>> -     unsigned long addr, flags;
>>>>>> +     unsigned long addr;
>>>>>>       int order = pool->min_alloc_order;
>>>>>> -     int nbits, start_bit, end_bit;
>>>>>> +     int nbits, start_bit = 0, end_bit, remain;
>>>>>> +
>>>>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>>>> +     BUG_ON(in_nmi());
>>>>>> +#endif
>>>>>>
>>>>>>       if (size == 0)
>>>>>>               return 0;
>>>>>>
>>>>>>       nbits = (size + (1UL << order) - 1) >> order;
>>>>>> -
>>>>>> -     read_lock(&pool->lock);
>>>>>> -     list_for_each(_chunk, &pool->chunks) {
>>>>>> -             chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
>>>>>> +     rcu_read_lock();
>>>>>> +     list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>>>>> +             if (size > atomic_read(&chunk->avail))
>>>>>> +                     continue;
>>>>>>
>>>>>>               end_bit = (chunk->end_addr - chunk->start_addr) >> order;
>>>>>> -
>>>>>> -             spin_lock_irqsave(&chunk->lock, flags);
>>>>>> -             start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
>>>>>> -                                             nbits, 0);
>>>>>> -             if (start_bit >= end_bit) {
>>>>>> -                     spin_unlock_irqrestore(&chunk->lock, flags);
>>>>>> +retry:
>>>>>> +             start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit,
>>>>>> +                                                    start_bit, nbits, 0);
>>>>>> +             if (start_bit >= end_bit)
>>>>>>                       continue;
>>>>>> +             remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
>>>>>> +             if (remain) {
>>>>>> +                     remain = bitmap_clear_ll(chunk->bits, start_bit,
>>>>>> +                                              nbits - remain);
>>>>>> +                     BUG_ON(remain);
>>>>>> +                     goto retry;
>>>>>>               }
>>>>>>
>>>>>>               addr = chunk->start_addr + ((unsigned long)start_bit << order);
>>>>>> -
>>>>>> -             bitmap_set(chunk->bits, start_bit, nbits);
>>>>>> -             spin_unlock_irqrestore(&chunk->lock, flags);
>>>>>> -             read_unlock(&pool->lock);
>>>>>> +             size = nbits << order;
>>>>>> +             atomic_sub(size, &chunk->avail);
>>>>>> +             rcu_read_unlock();
>>>>>
>>>>> I don't really like seeing a rcu_read_unlock() within a rcu list
>>>>> iteration (even if it comes right before a "return"). Doing:
>>>>>
>>>>> unsigned long addr = 0;
>>>>>
>>>>> rcu_read_lock();
>>>>> list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>>>>   if (...)
>>>>>     continue;
>>>>>   ...
>>>>>   addr = ...;
>>>>>   break;
>>>>> }
>>>>> rcu_read_unlock();
>>>>> return addr;
>>>>>
>>>>> Would be more symmetric, and would remove one return path, which makes
>>>>> the code easier to modify in the future.
>>>>
>>>> Unlock in loop is common in Linux kernel.  Sometimes it makes code
>>>> cleaner (but not always).  Yes, for this case, we can avoid unlock in
>>>> loop easily.  But for the next case it is not so clean.
>>>
>>> See comment below,
>>>
>>>>
>>>>>>               return addr;
>>>>>>       }
>>>>>> -     read_unlock(&pool->lock);
>>>>>> +     rcu_read_unlock();
>>>>>>       return 0;
>>>>>>  }
>>>>>>  EXPORT_SYMBOL(gen_pool_alloc);
>>>>>> @@ -155,33 +280,73 @@ EXPORT_SYMBOL(gen_pool_alloc);
>>>>>>   * @addr: starting address of memory to free back to pool
>>>>>>   * @size: size in bytes of memory to free
>>>>>>   *
>>>>>> - * Free previously allocated special memory back to the specified pool.
>>>>>> + * Free previously allocated special memory back to the specified
>>>>>> + * pool.  Can not be used in NMI handler on architectures without
>>>>>> + * NMI-safe cmpxchg implementation.
>>>>>>   */
>>>>>>  void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>>>>>>  {
>>>>>> -     struct list_head *_chunk;
>>>>>>       struct gen_pool_chunk *chunk;
>>>>>> -     unsigned long flags;
>>>>>>       int order = pool->min_alloc_order;
>>>>>> -     int bit, nbits;
>>>>>> +     int start_bit, nbits, remain;
>>>>>>
>>>>>> -     nbits = (size + (1UL << order) - 1) >> order;
>>>>>> -
>>>>>> -     read_lock(&pool->lock);
>>>>>> -     list_for_each(_chunk, &pool->chunks) {
>>>>>> -             chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
>>>>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>>>> +     BUG_ON(in_nmi());
>>>>>> +#endif
>>>>>>
>>>>>> +     nbits = (size + (1UL << order) - 1) >> order;
>>>
>>> you could add:
>>>
>>>   remain = nbits;
>>>
>>>>>> +     rcu_read_lock();
>>>>>> +     list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>>>>>               if (addr >= chunk->start_addr && addr < chunk->end_addr) {
>>>>>>                       BUG_ON(addr + size > chunk->end_addr);
>>>>>> -                     spin_lock_irqsave(&chunk->lock, flags);
>>>>>> -                     bit = (addr - chunk->start_addr) >> order;
>>>>>> -                     while (nbits--)
>>>>>> -                             __clear_bit(bit++, chunk->bits);
>>>>>> -                     spin_unlock_irqrestore(&chunk->lock, flags);
>>>>>> -                     break;
>>>>>> +                     start_bit = (addr - chunk->start_addr) >> order;
>>>
>>> You could turn this:
>>>
>>>>>> +                     remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
>>>>>> +                     BUG_ON(remain);
>>>>>> +                     size = nbits << order;
>>>>>> +                     atomic_add(size, &chunk->avail);
>>>
>>> into:
>>>
>>>   remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
>>>   size = nbits << order;
>>>   atomic_add(size, &chunk->avail);
>>>   break;
>>>     
>>>
>>>>>> +                     rcu_read_unlock();
>>>>>
>>>>> Same comment as above apply here.
>>>>
>>>> It is harder to remove unlock in loop here.  An extra variable should be
>>>> used to indicate that something is freed from the pool.  Do you think it
>>>> is cleaner to just keep the unlock in loop here?
>>>>
>>>> Best Regards,
>>>> Huang Ying
>>>>
>>>>> +                     return;
>>>>>               }
>>>>>       }
>>>
>>> And turn this:
>>>
>>>>> -     BUG_ON(nbits > 0);
>>>>> -     read_unlock(&pool->lock);
>>>>> +     rcu_read_unlock();
>>>>> +     BUG();
>>>
>>> into:
>>>
>>>   BUG_ON(remain);
>>>   rcu_read_unlock();
>>>
>>> Does that look OK to you ? On the plus side, you end up having a single
>>> BUG_ON() in the function.
>>
>> I am afraid this make code a little harder to be understood.  Why do you
>> hate unlock in loop so much?  It is common in kernel and I think most
>> kernel developers are familiar with it.
> 
> I'm fine either way for this function, no strong opinion on this one.

Thanks.  I will keep this function and change the other one
(gen_pool_alloc).

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html