[PATCH 1/3] drm/amdgpu: fix a typo

maraeo@xxxxxxxxx (Marek Olšák) · Fri, 23 Jun 2017 01:54:15 +0200

That's all nice, but does it improve performance? Have you been able
to measure some performance difference with that code? Were you
targeting a specific inefficiency you had seen e.g. with a CPU
profiler?

Marek

On Thu, Jun 22, 2017 at 8:19 PM, axie <axie at amd.com> wrote:
> To clarify, local IRQ is disabled by calling raw_local_irq_save(flags);
>
> Function __lock_acquire double checks that the local IRQ is really disabled.
>
>
>
> On 2017-06-22 01:34 PM, axie wrote:
>>
>> Hi Marek,
>>
>> Spin lock and spin unlock is fast. But it is not so fast compared with
>> atomic, which is a single CPU instruction in x86.
>>
>>
>> 1. spinlock does NOT allow preemption at local CPU. Let us have a look at
>> how spin lock was implemented.
>>
>> static inline void __raw_spin_lock(raw_spinlock_t *lock)
>> {
>>     preempt_disable(); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<--This is
>> memory barrier operation too.
>>     spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
>>     LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
>> }
>>
>> 2.  A function  __lock_acquire called by spinlock. The function is so long
>> that I would not attach all of it here.
>>
>> There is atomic operation inside and 12 meta data updates and 14 if
>> statements and it calls quite some other functions.
>>
>> Note that it disable IRQ...
>>
>> static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>               int trylock, int read, int check, int hardirqs_off,
>>               struct lockdep_map *nest_lock, unsigned long ip,
>>               int references, int pin_count)
>> {
>>     struct task_struct *curr = current;
>>     struct lock_class *class = NULL;
>>     struct held_lock *hlock;
>>     unsigned int depth;
>>     int chain_head = 0;
>>     int class_idx;
>>     u64 chain_key;
>>
>>     if (unlikely(!debug_locks))
>>         return 0;
>>
>>     /*
>>      * Lockdep should run with IRQs disabled, otherwise we could
>>      * get an interrupt which would want to take locks, which would
>>      * end up in lockdep and have you got a head-ache already?
>>      */
>>     if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) <<<<<<<<<<<<<<<Disable IRQ
>>         return 0;
>>
>> ....
>>
>> 3. Another function called by spinlock in a higher level:
>>
>> void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>
>>               int trylock, int read, int check,
>>               struct lockdep_map *nest_lock, unsigned long ip)
>> {
>>     unsigned long flags;
>>
>>     if (unlikely(current->lockdep_recursion))
>>         return;
>>
>>     raw_local_irq_save(flags);
>>     check_flags(flags);
>>
>>     current->lockdep_recursion = 1;
>>     trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock,
>> ip);
>>     __lock_acquire(lock, subclass, trylock, read, check,
>>                irqs_disabled_flags(flags), nest_lock, ip, 0, 0);
>>     current->lockdep_recursion = 0;
>>     raw_local_irq_restore(flags);
>> }
>>
>>
>> Thanks,
>>
>> Alex Bin
>>
>>
>> On 2017-06-22 12:27 PM, Marek OlÅ¡Ã¡k wrote:
>>>
>>> On Thu, Jun 22, 2017 at 5:33 PM, Xie, AlexBin <AlexBin.Xie at amd.com>
>>> wrote:
>>>>
>>>> Hi Christian,
>>>>
>>>>
>>>> In fact, the change from spinlock to atomic is quite painful. When I
>>>> started, I thought it was easy but later I found there might be race
>>>> condition here and there. Now I think the change looks more robust. In
>>>> kernel source, there are several other drivers used the same trick.
>>>>
>>>>
>>>> On the other hand, I think the logic itself might be optimized
>>>> considering
>>>> the locking. But I had spent quite some effort to maintain original
>>>> logic.
>>>
>>> It seems quite complicated and I don't know if there is any
>>> performance benefit. Spinlocks are nice because they allow preemption.
>>>
>>> It would be more interesting to merge the CS and BO_LIST ioctls into one.
>>>
>>> Marek
>>
>>
>