Hi, On 11/29/2022 2:06 PM, Tonghao Zhang wrote: > On Tue, Nov 29, 2022 at 12:32 PM Hou Tao <houtao1@xxxxxxxxxx> wrote: >> Hi, >> >> On 11/29/2022 5:55 AM, Hao Luo wrote: >>> On Sun, Nov 27, 2022 at 7:15 PM Tonghao Zhang <xiangxia.m.yue@xxxxxxxxx> wrote: >>> Hi Tonghao, >>> >>> With a quick look at the htab_lock_bucket() and your problem >>> statement, I agree with Hou Tao that using hash & >>> min(HASHTAB_MAP_LOCK_MASK, n_bucket - 1) to index in map_locked seems >>> to fix the potential deadlock. Can you actually send your changes as >>> v2 so we can take a look and better help you? Also, can you explain >>> your solution in your commit message? Right now, your commit message >>> has only a problem statement and is not very clear. Please include >>> more details on what you do to fix the issue. >>> >>> Hao >> It would be better if the test case below can be rewritten as a bpf selftests. >> Please see comments below on how to improve it and reproduce the deadlock. >>>> Hi >>>> only a warning from lockdep. >> Thanks for your details instruction. I can reproduce the warning by using your >> setup. I am not a lockdep expert, it seems that fixing such warning needs to set >> different lockdep class to the different bucket. Because we use map_locked to >> protect the acquisition of bucket lock, so I think we can define lock_class_key >> array in bpf_htab (e.g., lockdep_key[HASHTAB_MAP_LOCK_COUNT]) and initialize the >> bucket lock accordingly. > Hi > Thanks for your reply. define the lock_class_key array looks good. > Last question: how about using raw_spin_trylock_irqsave, if the > bucket is locked on the same or other cpu. > raw_spin_trylock_irqsave will return the false, we should return the > -EBUSY in htab_lock_bucket. > > static inline int htab_lock_bucket(struct bucket *b, > unsigned long *pflags) > { > unsigned long flags; > > if (!raw_spin_trylock_irqsave(&b->raw_lock, flags)) > return -EBUSY; > > *pflags = flags; > return 0; > } The flaw of trylock solution is that it can not distinguish between dead-lock and lock with high contention. So I don't think it is a good idea to do that. > >>>> 1. the kernel .config >>>> # >>>> # Debug Oops, Lockups and Hangs >>>> # >>>> CONFIG_PANIC_ON_OOPS=y >>>> CONFIG_PANIC_ON_OOPS_VALUE=1 >>>> CONFIG_PANIC_TIMEOUT=0 >>>> CONFIG_LOCKUP_DETECTOR=y >>>> CONFIG_SOFTLOCKUP_DETECTOR=y >>>> # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set >>>> CONFIG_HARDLOCKUP_DETECTOR_PERF=y >>>> CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y >>>> CONFIG_HARDLOCKUP_DETECTOR=y >>>> CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y >>>> CONFIG_DETECT_HUNG_TASK=y >>>> CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120 >>>> # CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set >>>> # CONFIG_WQ_WATCHDOG is not set >>>> # CONFIG_TEST_LOCKUP is not set >>>> # end of Debug Oops, Lockups and Hangs >>>> >>>> 2. bpf.c, the map size is 2. >>>> struct { >>>> __uint(type, BPF_MAP_TYPE_HASH); >> Adding __uint(map_flags, BPF_F_ZERO_SEED); to ensure there will be no seed for >> hash calculation, so we can use key=4 and key=20 to construct the case that >> these two keys have the same bucket index but have different map_locked index. >>>> __uint(max_entries, 2); >>>> __uint(key_size, sizeof(unsigned int)); >>>> __uint(value_size, sizeof(unsigned int)); >>>> } map1 SEC(".maps"); >>>> >>>> static int bpf_update_data() >>>> { >>>> unsigned int val = 1, key = 0; >> key = 20 >>>> return bpf_map_update_elem(&map1, &key, &val, BPF_ANY); >>>> } >>>> >>>> SEC("kprobe/ip_rcv") >>>> int bpf_prog1(struct pt_regs *regs) >>>> { >>>> bpf_update_data(); >>>> return 0; >>>> } >> kprobe on ip_rcv is unnecessary, you can just remove it. >>>> SEC("tracepoint/nmi/nmi_handler") >>>> int bpf_prog2(struct pt_regs *regs) >>>> { >>>> bpf_update_data(); >>>> return 0; >>>> } >> Please use SEC("fentry/nmi_handle") instead of SEC("tracepoint") and unfold >> bpf_update_data(), because the running of bpf program on tracepoint will be >> blocked by bpf_prog_active which will be increased bpf_map_update_elem through >> bpf_disable_instrumentation(). >>>> char _license[] SEC("license") = "GPL"; >>>> unsigned int _version SEC("version") = LINUX_VERSION_CODE; >>>> >>>> 3. bpf loader. >>>> #include "kprobe-example.skel.h" >>>> >>>> #include <unistd.h> >>>> #include <errno.h> >>>> >>>> #include <bpf/bpf.h> >>>> >>>> int main() >>>> { >>>> struct kprobe_example *skel; >>>> int map_fd, prog_fd; >>>> int i; >>>> int err = 0; >>>> >>>> skel = kprobe_example__open_and_load(); >>>> if (!skel) >>>> return -1; >>>> >>>> err = kprobe_example__attach(skel); >>>> if (err) >>>> goto cleanup; >>>> >>>> /* all libbpf APIs are usable */ >>>> prog_fd = bpf_program__fd(skel->progs.bpf_prog1); >>>> map_fd = bpf_map__fd(skel->maps.map1); >>>> >>>> printf("map_fd: %d\n", map_fd); >>>> >>>> unsigned int val = 0, key = 0; >>>> >>>> while (1) { >>>> bpf_map_delete_elem(map_fd, &key); >> No needed neither. Only do bpf_map_update_elem() is OK. Also change key=0 from >> key=4, so it will have the same bucket index as key=20 but have different >> map_locked index. >>>> bpf_map_update_elem(map_fd, &key, &val, BPF_ANY); >>>> } >> Also need to pin the process on a specific CPU (e.g., CPU 0) >>>> cleanup: >>>> kprobe_example__destroy(skel); >>>> return err; >>>> } >>>> >>>> 4. run the bpf loader and perf record for nmi interrupts. the warming occurs >> For perf event, you can reference prog_tests/find_vma.c on how to using >> perf_event_open to trigger a perf nmi interrupt. The perf event also needs to >> pin on a specific CPU as the caller of bpf_map_update_elem() does. >> >>>> -- >>>> Best regards, Tonghao >>> . >