From: Breno Leitao <leitao@xxxxxxxxxx> Sent: Thursday, January 2, 2025 2:16 AM > > On Sat, Dec 21, 2024 at 05:06:55PM +0800, Herbert Xu wrote: > > On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote: > > > > > > The growth check should stay with the atomic_inc. Something like > > > this should work: > > > > OK I've applied your patch with the atomic_inc move. > > Sorry, I was on vacation, and I am back now. Let me know if you need > anything further. > > Thanks for fixing it, > --breno Breno and Herbert -- This patch seems to break things in linux-next. I'm testing with linux-next20250108 in a VM in the Azure public cloud. The Mellanox mlx5 ethernet NIC in the VM is failing to get setup. I bisected to commit e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock"), then debugged why opening the mlx5 NIC device is failing. The failure is in the XDP code in function __xdp_reg_mem_model() where the call to rhashtable_insert_slow() is returning -E2BIG. The problem does not occur when the commit is reverted. The function call stack is this: dev_open() __dev_open() mlx5e_open() mlx5e_open_locked() mlx5e_open_channels() mlx5e_open_channel() mlx5e_open_queues() mlx5e_open_rxq_rq() mlx5e_open_rq() mlx5e_alloc_rq() xdp_rxq_info_reg_mem_model() __xdp_reg_mem_model() rhashtable_insert_slow() I have not debugged further as I don't know anything about the rhashtable code or the XDP code. The only repro I have is a VM in Azure. I thought I'd ask you (Breno and Herbert) to review the patch again and see if there's a path that could cause the hash table to be incorrectly detected as full. I've included the linux-hyperv mailing list and the mlx5 driver maintainers on this email. Someone involved with Azure/Hyper-V or the mlx5 driver may have seen the problem, and I want to try to avoid duplicative debugging. Let me know if there's something I can do to help debug further. Thanks, Michael Kelley