I've dropped the io_queue type from housekeeping code, because of 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed interrupts"). This convienced me that the original goal of the managed_irq argument was to move away any noise from the isolcpus. So let's just use it and if there are real users of the current behavior we can still add it back the. I hope that Ming will chime in eventually. The rest of the changes are pretty small, splitting one patch and adding documentation. Initial cover letter: The nvme-pci driver is ignoring the isolcpus configuration. There were several attempts to fix this in the past [1][2]. This is another attempt but this time trying to address the feedback and solve it in the core code. The first patch introduces a new option for isolcpus 'io_queue', but I'm not really sure if this is needed and we could just use the managed_irq option instead. I guess depends if there is an use case which depens on queues on the isolated CPUs. The second patch introduces a new block layer helper which returns the number of possible queues. I suspect it would makes sense also to make this helper a bit smarter and also consider the number of queues the hardware supports. And the last patch updates the group_cpus_evenly function so that it uses only the housekeeping CPUs when they are defined Note this series is not addressing the affinity setting of the admin queue (queue 0). I'd like to address this after we agreed on how to solve this. Currently, the admin queue affinity can be controlled by the irq_afffinity command line option, so there is at least a workaround for it. Baseline: available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 node 0 size: 1536 MB node 0 free: 1227 MB node 1 cpus: 4 5 6 7 node 1 size: 1729 MB node 1 free: 1422 MB node distances: node 0 1 0: 10 20 1: 20 10 options nvme write_queues=4 poll_queues=4 55: 0 41 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 0-edge nvme0q0 affinity: 0-3 63: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 1-edge nvme0q1 affinity: 4-5 64: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 2-edge nvme0q2 affinity: 6-7 65: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 3-edge nvme0q3 affinity: 0-1 66: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 4-edge nvme0q4 affinity: 2-3 67: 0 0 0 0 24 0 0 0 PCI-MSIX-0000:00:05.0 5-edge nvme0q5 affinity: 4 68: 0 0 0 0 0 1 0 0 PCI-MSIX-0000:00:05.0 6-edge nvme0q6 affinity: 5 69: 0 0 0 0 0 0 41 0 PCI-MSIX-0000:00:05.0 7-edge nvme0q7 affinity: 6 70: 0 0 0 0 0 0 0 3 PCI-MSIX-0000:00:05.0 8-edge nvme0q8 affinity: 7 71: 1 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 9-edge nvme0q9 affinity: 0 72: 0 18 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 10-edge nvme0q10 affinity: 1 73: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 11-edge nvme0q11 affinity: 2 74: 0 0 0 3 0 0 0 0 PCI-MSIX-0000:00:05.0 12-edge nvme0q12 affinity: 3 queue mapping for /dev/nvme0n1 hctx0: default 4 5 hctx1: default 6 7 hctx2: default 0 1 hctx3: default 2 3 hctx4: read 4 hctx5: read 5 hctx6: read 6 hctx7: read 7 hctx8: read 0 hctx9: read 1 hctx10: read 2 hctx11: read 3 hctx12: poll 4 5 hctx13: poll 6 7 hctx14: poll 0 1 hctx15: poll 2 3 PCI name is 00:05.0: nvme0n1 irq 55, cpu list 0-3, effective list 1 irq 63, cpu list 4-5, effective list 5 irq 64, cpu list 6-7, effective list 7 irq 65, cpu list 0-1, effective list 1 irq 66, cpu list 2-3, effective list 3 irq 67, cpu list 4, effective list 4 irq 68, cpu list 5, effective list 5 irq 69, cpu list 6, effective list 6 irq 70, cpu list 7, effective list 7 irq 71, cpu list 0, effective list 0 irq 72, cpu list 1, effective list 1 irq 73, cpu list 2, effective list 2 irq 74, cpu list 3, effective list 3 * patched: 48: 0 0 33 0 0 0 0 0 PCI-MSIX-0000:00:05.0 0-edge nvme0q0 affinity: 0-3 58: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 1-edge nvme0q1 affinity: 4 59: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 2-edge nvme0q2 affinity: 5 60: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 3-edge nvme0q3 affinity: 0 61: 0 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 4-edge nvme0q4 affinity: 1 62: 0 0 0 0 45 0 0 0 PCI-MSIX-0000:00:05.0 5-edge nvme0q5 affinity: 4 63: 0 0 0 0 0 12 0 0 PCI-MSIX-0000:00:05.0 6-edge nvme0q6 affinity: 5 64: 2 0 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 7-edge nvme0q7 affinity: 0 65: 0 35 0 0 0 0 0 0 PCI-MSIX-0000:00:05.0 8-edge nvme0q8 affinity: 1 queue mapping for /dev/nvme0n1 hctx0: default 2 3 4 6 7 hctx1: default 5 hctx2: default 0 hctx3: default 1 hctx4: read 4 hctx5: read 5 hctx6: read 0 hctx7: read 1 hctx8: poll 4 hctx9: poll 5 hctx10: poll 0 hctx11: poll 1 PCI name is 00:05.0: nvme0n1 irq 48, cpu list 0-3, effective list 2 irq 58, cpu list 4, effective list 4 irq 59, cpu list 5, effective list 5 irq 60, cpu list 0, effective list 0 irq 61, cpu list 1, effective list 1 irq 62, cpu list 4, effective list 4 irq 63, cpu list 5, effective list 5 irq 64, cpu list 0, effective list 0 irq 65, cpu list 1, effective list 1 [1] https://lore.kernel.org/lkml/20220423054331.GA17823@xxxxxx/T/#m9939195a465accbf83187caf346167c4242e798d [2] https://lore.kernel.org/linux-nvme/87fruci5nj.ffs@tglx/ Signed-off-by: Daniel Wagner <dwagner@xxxxxxx> --- Changes in v2: - updated documentation - splitted blk/nvme-pci patch - dropped HK_TYPE_IO_QUEUE, use HK_TYPE_MANAGED_IRQ - Link to v1: https://lore.kernel.org/r/20240621-isolcpus-io-queues-v1-0-8b169bf41083@xxxxxxx --- Daniel Wagner (3): blk-mq: add blk_mq_num_possible_queues helper nvme-pci: limit queue count to housekeeping CPUs lib/group_cpus.c: honor housekeeping config when grouping CPUs block/blk-mq-cpumap.c | 20 +++++++++++++ drivers/nvme/host/pci.c | 5 ++-- include/linux/blk-mq.h | 1 + lib/group_cpus.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 97 insertions(+), 4 deletions(-) --- base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f change-id: 20240620-isolcpus-io-queues-1a88eb47ff8b Best regards, -- Daniel Wagner <dwagner@xxxxxxx>