Jakub Kicinski <kuba@xxxxxxxxxx> writes: > On Mon, 11 Nov 2024 18:09:01 +0100 Petr Machata wrote: >> Check that only one notification is produced for various FDB edit >> operations. >> >> Regarding the ip_link_add() and ip_link_master() helpers. This pattern of >> action plus corresponding defer is bound to come up often, and a dedicated >> vocabulary to capture it will be handy. tunnel_create() and vlan_create() >> from forwarding/lib.sh are somewhat opaque and perhaps too kitchen-sinky, >> so I tried to go in the opposite direction with these ones, and wrapped >> only the bare minimum to schedule a corresponding cleanup. > > Looks like it fails about half of the time :( > > https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=fdb-notify&br-cnt=200 OK, I can't reproduce this. Trying in VM, on an actual HW, debug, no debug, no luck. But I see basically two failures: - A "0 seen, 1 expected", which... I don't know, maybe it could just be a misplaced sleep. I don't see how, but it's a deterministing scenario, there shouldn't be anything racy here, either it emits or it doesn't, so some buffering issue is the only thing I can think of. - Deadlocks. E.g. this, which looks like it deadlocked and timed out ("bad unlock balance detected" followed by "blocked for more than 122 seconds" et.al.): https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/846621/18-fdb-notify-sh/ Like... how could this patchset even theoretically cause a deadlock?