Hi, On 11/30/2022 1:55 PM, Tonghao Zhang wrote: > On Wed, Nov 30, 2022 at 12:13 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> Hi, >> >> On 11/30/2022 10:47 AM, Tonghao Zhang wrote: >>> On Wed, Nov 30, 2022 at 9:50 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>>> Hi Hao, >>>> >>>> On 11/30/2022 3:36 AM, Hao Luo wrote: >>>>> On Tue, Nov 29, 2022 at 9:32 AM Boqun Feng <boqun.feng@xxxxxxxxx> wrote: >>>>>> Just to be clear, I meant to refactor htab_lock_bucket() into a try >>>>>> lock pattern. Also after a second thought, the below suggestion doesn't >>>>>> work. I think the proper way is to make htab_lock_bucket() as a >>>>>> raw_spin_trylock_irqsave(). >>>>>> >>>>>> Regards, >>>>>> Boqun >>>>>> >>>>> The potential deadlock happens when the lock is contended from the >>>>> same cpu. When the lock is contended from a remote cpu, we would like >>>>> the remote cpu to spin and wait, instead of giving up immediately. As >>>>> this gives better throughput. So replacing the current >>>>> raw_spin_lock_irqsave() with trylock sacrifices this performance gain. >>>>> >>>>> I suspect the source of the problem is the 'hash' that we used in >>>>> htab_lock_bucket(). The 'hash' is derived from the 'key', I wonder >>>>> whether we should use a hash derived from 'bucket' rather than from >>>>> 'key'. For example, from the memory address of the 'bucket'. Because, >>>>> different keys may fall into the same bucket, but yield different >>>>> hashes. If the same bucket can never have two different 'hashes' here, >>>>> the map_locked check should behave as intended. Also because >>>>> ->map_locked is per-cpu, execution flows from two different cpus can >>>>> both pass. >>>> The warning from lockdep is due to the reason the bucket lock A is used in a >>>> no-NMI context firstly, then the same bucke lock is used a NMI context, so >>> Yes, I tested lockdep too, we can't use the lock in NMI(but only >>> try_lock work fine) context if we use them no-NMI context. otherwise >>> the lockdep prints the warning. >>> * for the dead-lock case: we can use the >>> 1. hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1) >>> 2. or hash bucket address. >> Use the computed hash will be better than hash bucket address, because the hash >> buckets are allocated sequentially. >>> * for lockdep warning, we should use in_nmi check with map_locked. >>> >>> BTW, the patch doesn't work, so we can remove the lock_key >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c50eb518e262fa06bd334e6eec172eaf5d7a5bd9 >>> >>> static inline int htab_lock_bucket(const struct bpf_htab *htab, >>> struct bucket *b, u32 hash, >>> unsigned long *pflags) >>> { >>> unsigned long flags; >>> >>> hash = hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1); >>> >>> preempt_disable(); >>> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { >>> __this_cpu_dec(*(htab->map_locked[hash])); >>> preempt_enable(); >>> return -EBUSY; >>> } >>> >>> if (in_nmi()) { >>> if (!raw_spin_trylock_irqsave(&b->raw_lock, flags)) >>> return -EBUSY; >> The only purpose of trylock here is to make lockdep happy and it may lead to >> unnecessary -EBUSY error for htab operations in NMI context. I still prefer add >> a virtual lock-class for map_locked to fix the lockdep warning. So could you use > Hi, what is virtual lock-class ? Can you give me an example of what you mean? If LOCKDEP is enabled, raw_spinlock will add dep_map in the definition and it also calls lock_acquire() and lock_release() to assist the deadlock check. Now map_locked is not a lock but it acts like a raw_spin_trylock, so we need to add dep_map to it manually, and then also call lock_acquire(trylock=1) and lock_release() before increasing and decreasing map_locked. You can reference the implementation of raw_spin_trylock and raw_spin_unlock for more details. >> separated patches to fix the potential dead-lock and the lockdep warning ? It >> will be better you can also add a bpf selftests for deadlock problem as said before. >> >> Thanks, >> Tao >>> } else { >>> raw_spin_lock_irqsave(&b->raw_lock, flags); >>> } >>> >>> *pflags = flags; >>> return 0; >>> } >>> >>> >>>> lockdep deduces that may be a dead-lock. I have already tried to use the same >>>> map_locked for keys with the same bucket, the dead-lock is gone, but still got >>>> lockdep warning. >>>>> Hao >>>>> . >