From: Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx> Sent: Thursday, May 26, 2022 11:55 AM > Subject: [PATCH] Drivers: hv: vmbus: Adding isolated cpu support for channel interrupts > mapping Let me suggest a more compact and precise Subject: Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs > > Adding support for vmbus channels to take isolated cpu in consideration > while assigning interrupt to different cpus. This also prevents user from > setting any isolated cpu to vmbus channel interrupt assignment by sysfs > entry. Isolated cpu can be configured by kernel command line parameter > 'isolcpus=managed_irq,<#cpu>'. Also, for the commit statement: When initially assigning a VMbus channel interrupt to a CPU, don't choose a managed IRQ isolated CPU (as specified on the kernel boot line with parameter 'isolcpus=managed_irq,<#cpu>'). Also, when using sysfs to change the CPU that a VMbus channel will interrupt, don't allow changing to a managed IRQ isolated CPU. > > Signed-off-by: Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx> > --- > drivers/hv/channel_mgmt.c | 18 ++++++++++++------ > drivers/hv/vmbus_drv.c | 6 ++++++ > 2 files changed, 18 insertions(+), 6 deletions(-) > > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > index 97d8f56..e1fe029 100644 > --- a/drivers/hv/channel_mgmt.c > +++ b/drivers/hv/channel_mgmt.c > @@ -21,6 +21,7 @@ > #include <linux/cpu.h> > #include <linux/hyperv.h> > #include <asm/mshyperv.h> > +#include <linux/sched/isolation.h> > > #include "hyperv_vmbus.h" > > @@ -728,16 +729,20 @@ static void init_vp_index(struct vmbus_channel *channel) > u32 i, ncpu = num_online_cpus(); > cpumask_var_t available_mask; > struct cpumask *allocated_mask; > + const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ); > u32 target_cpu; > int numa_node; > > if (!perf_chn || > - !alloc_cpumask_var(&available_mask, GFP_KERNEL)) { > + !alloc_cpumask_var(&available_mask, GFP_KERNEL) || > + cpumask_empty(hk_mask)) { > /* > * If the channel is not a performance critical > * channel, bind it to VMBUS_CONNECT_CPU. > * In case alloc_cpumask_var() fails, bind it to > * VMBUS_CONNECT_CPU. > + * If all the cpus are isolated, bind it to > + * VMBUS_CONNECT_CPU. > */ > channel->target_cpu = VMBUS_CONNECT_CPU; > if (perf_chn) > @@ -758,17 +763,19 @@ static void init_vp_index(struct vmbus_channel *channel) > } > allocated_mask = &hv_context.hv_numa_map[numa_node]; > > - if (cpumask_equal(allocated_mask, cpumask_of_node(numa_node))) { > +retry: > + cpumask_xor(available_mask, allocated_mask, cpumask_of_node(numa_node)); There's a bug here that existed in the code prior to this patch. The code checks to make sure cpumask_of_node(numa_node) is not empty, and then later references cpumask_of_node(numa_node) again. But in between the check and the use, one or more CPUs could go offline, leaving cpumask_of_node(numa_node) empty since that array of cpumasks contains only online CPUs. In such a case, execution could get stuck in an infinite loop with available_mask being empty. The solution is to call cpus_read_lock() before starting the main "for" loop and then call cpus_read_unlock() at the end. This lock will prevent CPUs from going offline, and hence ensure that the node mask can't become empty. You'll notice that target_cpu_store() uses that lock to prevent a similar problem. Fixing this locking problem should probably be a separate patch. Michael > + cpumask_and(available_mask, available_mask, hk_mask); > + > + if (cpumask_empty(available_mask)) { > /* > * We have cycled through all the CPUs in the node; > * reset the allocated map. > */ > cpumask_clear(allocated_mask); > + goto retry; > } > > - cpumask_xor(available_mask, allocated_mask, > - cpumask_of_node(numa_node)); > - > target_cpu = cpumask_first(available_mask); > cpumask_set_cpu(target_cpu, allocated_mask); > > @@ -778,7 +785,6 @@ static void init_vp_index(struct vmbus_channel *channel) > } > > channel->target_cpu = target_cpu; > - > free_cpumask_var(available_mask); > } Removing the blank line above is a gratuitous change that isn't needed. Generally, a patch should avoid such changes unless the purpose of the patch is code cleanup. > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > index 714d549..23660a8 100644 > --- a/drivers/hv/vmbus_drv.c > +++ b/drivers/hv/vmbus_drv.c > @@ -21,6 +21,7 @@ > #include <linux/kernel_stat.h> > #include <linux/clockchips.h> > #include <linux/cpu.h> > +#include <linux/sched/isolation.h> > #include <linux/sched/task_stack.h> > > #include <linux/delay.h> > @@ -1770,6 +1771,11 @@ static ssize_t target_cpu_store(struct vmbus_channel > *channel, > if (target_cpu >= nr_cpumask_bits) > return -EINVAL; > > + if (!cpumask_test_cpu(target_cpu, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ))) { > + dev_err(&channel->device_obj->device, > + "cpu (%d) is isolated, can't be assigned\n", target_cpu); I don't think a message should be output here. The other errors in this function don't output a message. Generally, the kernel doesn't output a message just because a user provided bad input. Doing so makes it too easy for a user (even a sysadmin) to cause the kernel to go wild outputting messages. Michael > + return -EINVAL; > + } > /* No CPUs should come up or down during this. */ > cpus_read_lock(); > > -- > 1.8.3.1