On Thu, Dec 19, 2024 at 04:38:43PM +0100, Daniel Wagner wrote: > When isolcpus=managed_irq is enabled all hardware queues should run on > the housekeeping CPUs only. Thus ignore the affinity mask provided by > the driver. Compared with in-tree code, the above words are misleading. - irq core code respects isolated CPUs by trying to exclude isolated CPUs from effective masks - blk-mq won't schedule blockd on isolated CPUs If application aren't run on isolated CPUs, IO interrupt usually won't be triggered on isolated CPUs, so isolated CPUs are _not_ ignored. > On Thu, Dec 19, 2024 at 05:20:44PM +0800, Ming Lei wrote: > > > + cpumask_andnot(isol_mask, > > > + cpu_possible_mask, > > > + housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)); > > > + > > > + for_each_cpu(cpu, isol_mask) { > > > + qmap->mq_map[cpu] = qmap->queue_offset + queue; > > > + queue = (queue + 1) % qmap->nr_queues; > > > + } > > > > Looks the IO hang issue in V3 isn't addressed yet, is it? > > > > https://lore.kernel.org/linux-block/ZrtX4pzqwVUEgIPS@fedora/ > > I've added an explanation in the cover letter why this is not > addressed. From the cover letter: > > I've experimented for a while and all solutions I came up were horrible > hacks (the hotpath needs to be touched) and I don't want to slow down all > other users (which are almost everyone). IMO, it's just not worth trying IMO, this patchset is one improvement on existed best-effort approach, which works fine most of times, so why you do think it slows down everyone? > to fix this corner case. If the user is using isolcpus and does CPU > hotplug, we can expect that the user can also first offline the isolated > CPUs. I've discussed this topic during ALPSS and the room came to the > same conclusion. Thus I just added a patch which issues a warning that > IOs are likely to hang. If the change need userspace cooperation for using 'managed_irq', the exact behavior need to be documented in both this commit and Documentation/admin-guide/kernel-parameters.txt, instead of cover-letter only. But this patch does cause regression for old applications which can't follow the new introduced rule: ``` If the user is using isolcpus and does CPU hotplug, we can expect that the user can also first offline the isolated CPUs. ``` Thanks, Ming