On Fri, Oct 18, 2024 at 05:13:43PM +0000, Joe Damato wrote: > Link queues to NAPI instances via netdev-genl API so that users can > query this information with netlink. Handle a few cases in the driver: > 1. Link/unlink the NAPIs when XDP is enabled/disabled > 2. Handle IGC_FLAG_QUEUE_PAIRS enabled and disabled > > Example output when IGC_FLAG_QUEUE_PAIRS is enabled: > > $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ > --dump queue-get --json='{"ifindex": 2}' > > [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, > {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, > {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, > {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, > {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}, > {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'tx'}, > {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'}, > {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}] > > Since IGC_FLAG_QUEUE_PAIRS is enabled, you'll note that the same NAPI ID > is present for both rx and tx queues at the same index, for example > index 0: > > {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, > {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}, > > To test IGC_FLAG_QUEUE_PAIRS disabled, a test system was booted using > the grub command line option "maxcpus=2" to force > igc_set_interrupt_capability to disable IGC_FLAG_QUEUE_PAIRS. > > Example output when IGC_FLAG_QUEUE_PAIRS is disabled: > > $ lscpu | grep "On-line CPU" > On-line CPU(s) list: 0,2 > > $ ethtool -l enp86s0 | tail -5 > Current hardware settings: > RX: n/a > TX: n/a > Other: 1 > Combined: 2 > > $ cat /proc/interrupts | grep enp > 144: [...] enp86s0 > 145: [...] enp86s0-rx-0 > 146: [...] enp86s0-rx-1 > 147: [...] enp86s0-tx-0 > 148: [...] enp86s0-tx-1 > > 1 "other" IRQ, and 2 IRQs for each of RX and Tx, so we expect netlink to > report 4 IRQs with unique NAPI IDs: > > $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ > --dump napi-get --json='{"ifindex": 2}' > [{'id': 8196, 'ifindex': 2, 'irq': 148}, > {'id': 8195, 'ifindex': 2, 'irq': 147}, > {'id': 8194, 'ifindex': 2, 'irq': 146}, > {'id': 8193, 'ifindex': 2, 'irq': 145}] > > Now we examine which queues these NAPIs are associated with, expecting > that since IGC_FLAG_QUEUE_PAIRS is disabled each RX and TX queue will > have its own NAPI instance: > > $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ > --dump queue-get --json='{"ifindex": 2}' > [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, > {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, > {'id': 0, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'}, > {'id': 1, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}] > > Signed-off-by: Joe Damato <jdamato@xxxxxxxxxx> > --- > v3: > - Replace igc_unset_queue_napi with igc_set_queue_napi(adapater, i, > NULL), as suggested by Vinicius Costa Gomes > - Simplify implemention of igc_set_queue_napi as suggested by Kurt > Kanzenbach, with a tweak to use ring->queue_index > > v2: > - Update commit message to include tests for IGC_FLAG_QUEUE_PAIRS > disabled > - Refactored code to move napi queue mapping and unmapping to helper > functions igc_set_queue_napi and igc_unset_queue_napi > - Adjust the code to handle IGC_FLAG_QUEUE_PAIRS disabled > - Call helpers to map/unmap queues to NAPIs in igc_up, __igc_open, > igc_xdp_enable_pool, and igc_xdp_disable_pool > > drivers/net/ethernet/intel/igc/igc.h | 2 ++ > drivers/net/ethernet/intel/igc/igc_main.c | 33 ++++++++++++++++++++--- > drivers/net/ethernet/intel/igc/igc_xdp.c | 2 ++ > 3 files changed, 33 insertions(+), 4 deletions(-) I took another look at this to make sure that RTNL is held when igc_set_queue_napi is called after the e1000 bug report came in [1], and there may be two locations I've missed: 1. igc_resume, which calls __igc_open 2. igc_io_error_detected, which calls igc_down In both cases, I think the code can be modified to hold rtnl around calls to __igc_open and igc_down. Let me know what you think ? If you agree that I should hold rtnl in both of those cases, what is the best way to proceed: - send a v4, or - wait for this to get merged (since I got the notification it was pulled into intel-next) and send a fixes ? Here's the full analysis I came up with; I tried to be thorough, but it is certainly possible I missed a call site: For the up case: - igc_up: - called from igc_reinit_locked, which is called via: - igc_reset_task (rtnl is held) - igc_set_features (ndo_set_features, which itself has an ASSERT_RTNL) - various places in igc_ethtool (set_priv_flags, nway_reset, ethtool_set_eee) all of which have RTNL held - igc_change_mtu which also has RTNL held - __igc_open - called from igc_resume, which may need an rtnl_lock ? - igc_open - called from igc_io_resume, rtnl is held - called from igc_reinit_queues, only via ethool set_channels, where rtnl is held - ndo_open where rtnl is held For the down case: - igc_down: - called from various ethtool locations (set_ringparam, set_pauseparam, set_link_ksettings) all of which hold rtnl - called from igc_io_error_detected, which may need an rtnl_lock - igc_reinit_locked which is fine, as described above - igc_change_mtu which is fine, as described above - called from __igc_close - called from __igc_shutdown which holds rtnl - called from igc_reinit_queues which is fine as described above - called from igc_close which is ndo_close