That's all nice, but does it improve performance? Have you been able to measure some performance difference with that code? Were you targeting a specific inefficiency you had seen e.g. with a CPU profiler? Marek On Thu, Jun 22, 2017 at 8:19 PM, axie <axie at amd.com> wrote: > To clarify, local IRQ is disabled by calling raw_local_irq_save(flags); > > Function __lock_acquire double checks that the local IRQ is really disabled. > > > > On 2017-06-22 01:34 PM, axie wrote: >> >> Hi Marek, >> >> Spin lock and spin unlock is fast. But it is not so fast compared with >> atomic, which is a single CPU instruction in x86. >> >> >> 1. spinlock does NOT allow preemption at local CPU. Let us have a look at >> how spin lock was implemented. >> >> static inline void __raw_spin_lock(raw_spinlock_t *lock) >> { >> preempt_disable(); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<--This is >> memory barrier operation too. >> spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); >> LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock); >> } >> >> 2. A function __lock_acquire called by spinlock. The function is so long >> that I would not attach all of it here. >> >> There is atomic operation inside and 12 meta data updates and 14 if >> statements and it calls quite some other functions. >> >> Note that it disable IRQ... >> >> static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, >> int trylock, int read, int check, int hardirqs_off, >> struct lockdep_map *nest_lock, unsigned long ip, >> int references, int pin_count) >> { >> struct task_struct *curr = current; >> struct lock_class *class = NULL; >> struct held_lock *hlock; >> unsigned int depth; >> int chain_head = 0; >> int class_idx; >> u64 chain_key; >> >> if (unlikely(!debug_locks)) >> return 0; >> >> /* >> * Lockdep should run with IRQs disabled, otherwise we could >> * get an interrupt which would want to take locks, which would >> * end up in lockdep and have you got a head-ache already? >> */ >> if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) <<<<<<<<<<<<<<<Disable IRQ >> return 0; >> >> .... >> >> 3. Another function called by spinlock in a higher level: >> >> void lock_acquire(struct lockdep_map *lock, unsigned int subclass, >> >> int trylock, int read, int check, >> struct lockdep_map *nest_lock, unsigned long ip) >> { >> unsigned long flags; >> >> if (unlikely(current->lockdep_recursion)) >> return; >> >> raw_local_irq_save(flags); >> check_flags(flags); >> >> current->lockdep_recursion = 1; >> trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, >> ip); >> __lock_acquire(lock, subclass, trylock, read, check, >> irqs_disabled_flags(flags), nest_lock, ip, 0, 0); >> current->lockdep_recursion = 0; >> raw_local_irq_restore(flags); >> } >> >> >> Thanks, >> >> Alex Bin >> >> >> On 2017-06-22 12:27 PM, Marek Olšák wrote: >>> >>> On Thu, Jun 22, 2017 at 5:33 PM, Xie, AlexBin <AlexBin.Xie at amd.com> >>> wrote: >>>> >>>> Hi Christian, >>>> >>>> >>>> In fact, the change from spinlock to atomic is quite painful. When I >>>> started, I thought it was easy but later I found there might be race >>>> condition here and there. Now I think the change looks more robust. In >>>> kernel source, there are several other drivers used the same trick. >>>> >>>> >>>> On the other hand, I think the logic itself might be optimized >>>> considering >>>> the locking. But I had spent quite some effort to maintain original >>>> logic. >>> >>> It seems quite complicated and I don't know if there is any >>> performance benefit. Spinlocks are nice because they allow preemption. >>> >>> It would be more interesting to merge the CS and BO_LIST ioctls into one. >>> >>> Marek >> >> >