Hi Ming, On Sat, Jan 11, 2025 at 11:31:10AM +0800, Ming Lei wrote: > > What about a commit message like: > > > > When isolcpus=managed_irq is enabled, and the last housekeeping CPU for > > a given hardware context goes offline, there is no CPU left which > > handles the IOs anymore. If isolated CPUs mapped to this hardware > > context are online and an application running on these isolated CPUs > > issue an IO this will lead to stalls. > > It isn't correct, the in-tree code doesn't have such stall, no matter if > IO is issued from HK or isolated CPUs since the managed irq is guaranteed to > live if any mapped CPU is online. Yes, it has different properties. > Please see irq_do_set_affinity(): > > if (irqd_affinity_is_managed(data) && > housekeeping_enabled(HK_TYPE_MANAGED_IRQ)) { > const struct cpumask *hk_mask; > > hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ); > > cpumask_and(&tmp_mask, mask, hk_mask); > if (!cpumask_intersects(&tmp_mask, cpu_online_mask)) > prog_mask = mask; > else > prog_mask = &tmp_mask; > } else { > prog_mask = mask; > > The whole mask which may include isolated CPUs is only programmed to > hardware if there isn't any online CPU in `irq_mask & hk_mask`. This is not what I try to achieve here. The main motivation with this series is that isolated CPUs are never serving IRQs. > > I was talking about implementing the feature which would remap the > > isolated CPUs to online hardware context when the current hardware > > context goes offline. I didn't find a solution which I think would be > > worth presenting. All involved some sort of locking/refcounting in the > > hotpath, which I think we should just avoid. > > I understand the trouble, but it is still one improvement from user > viewpoint instead of feature since the interface of 'isolcpus=manage_irq' > isn't changed. Ah, I understood you wrong. I didn't want to upset you. I thought you were fine by changing how managed_irq works. > > Indeed, I forgot to update the documentation. I'll update it accordingly. > > It isn't documentation thing, it breaks the no-regression policy, which crosses > our red-line. > > If you really want to move on, please add one new kernel command > line with documenting the new usage which requires applications to > offline CPU in order. Sure, I'll bring the separate command line option back. Thanks, Daniel