Re: [PATCH] Drivers: hv: vmbus: Adding isolated cpu support for channel interrupts mapping

Saurabh Singh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx> · Thu, 26 May 2022 21:30:01 -0700



On Thu, May 26, 2022 at 08:45:33PM +0000, Michael Kelley (LINUX) wrote:
> From: Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx> Sent: Thursday, May 26, 2022 11:55 AM
> 
> > Subject: [PATCH] Drivers: hv: vmbus: Adding isolated cpu support for channel interrupts
> > mapping
> 
> Let me suggest a more compact and precise Subject:
> 
> Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
[sss]: ok

> 
> > 
> > Adding support for vmbus channels to take isolated cpu in consideration
> > while assigning interrupt to different cpus. This also prevents user from
> > setting any isolated cpu to vmbus channel interrupt assignment by sysfs
> > entry. Isolated cpu can be configured by kernel command line parameter
> > 'isolcpus=managed_irq,<#cpu>'.
> 
> Also, for the commit statement:
> 
> When initially assigning a VMbus channel interrupt to a CPU, don't choose
> a managed IRQ isolated CPU (as specified on the kernel boot line with
> parameter 'isolcpus=managed_irq,<#cpu>').  Also, when using sysfs to
> change the CPU that a VMbus channel will interrupt, don't allow changing
> to a managed IRQ isolated CPU.  
>
[sss] : ok 
> > 
> > Signed-off-by: Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx>
> > ---
> >  drivers/hv/channel_mgmt.c | 18 ++++++++++++------
> >  drivers/hv/vmbus_drv.c    |  6 ++++++
> >  2 files changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > index 97d8f56..e1fe029 100644
> > --- a/drivers/hv/channel_mgmt.c
> > +++ b/drivers/hv/channel_mgmt.c
> > @@ -21,6 +21,7 @@
> >  #include <linux/cpu.h>
> >  #include <linux/hyperv.h>
> >  #include <asm/mshyperv.h>
> > +#include <linux/sched/isolation.h>
> > 
> >  #include "hyperv_vmbus.h"
> > 
> > @@ -728,16 +729,20 @@ static void init_vp_index(struct vmbus_channel *channel)
> >  	u32 i, ncpu = num_online_cpus();
> >  	cpumask_var_t available_mask;
> >  	struct cpumask *allocated_mask;
> > +	const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
> >  	u32 target_cpu;
> >  	int numa_node;
> > 
> >  	if (!perf_chn ||
> > -	    !alloc_cpumask_var(&available_mask, GFP_KERNEL)) {
> > +	    !alloc_cpumask_var(&available_mask, GFP_KERNEL) ||
> > +	    cpumask_empty(hk_mask)) {
> >  		/*
> >  		 * If the channel is not a performance critical
> >  		 * channel, bind it to VMBUS_CONNECT_CPU.
> >  		 * In case alloc_cpumask_var() fails, bind it to
> >  		 * VMBUS_CONNECT_CPU.
> > +		 * If all the cpus are isolated, bind it to
> > +		 * VMBUS_CONNECT_CPU.
> >  		 */
> >  		channel->target_cpu = VMBUS_CONNECT_CPU;
> >  		if (perf_chn)
> > @@ -758,17 +763,19 @@ static void init_vp_index(struct vmbus_channel *channel)
> >  		}
> >  		allocated_mask = &hv_context.hv_numa_map[numa_node];
> > 
> > -		if (cpumask_equal(allocated_mask, cpumask_of_node(numa_node))) {
> > +retry:
> > +		cpumask_xor(available_mask, allocated_mask, cpumask_of_node(numa_node));
> 
> There's a bug here that existed in the code prior to this patch.  The code
> checks to make sure cpumask_of_node(numa_node) is not empty, and then
> later references cpumask_of_node(numa_node) again.  But in between the
> check and the use, one or more CPUs could go offline, leaving 
> cpumask_of_node(numa_node) empty since that array of cpumasks contains
> only online CPUs.  In such a case, execution could get stuck in an infinite
> loop with available_mask being empty.
> 
> The solution is to call cpus_read_lock() before starting the main "for"
> loop and then call cpus_read_unlock() at the end.  This lock will prevent
> CPUs from going offline, and hence ensure that the node mask can't
> become empty.   You'll notice that target_cpu_store() uses that lock
> to prevent a similar problem.
> 
> Fixing this locking problem should probably be a separate patch.
> 
> Michael
[sss] : Got it, will send this fix after this patch review is complete.

> 
> > +		cpumask_and(available_mask, available_mask, hk_mask);
> > +
> > +		if (cpumask_empty(available_mask)) {
> >  			/*
> >  			 * We have cycled through all the CPUs in the node;
> >  			 * reset the allocated map.
> >  			 */
> >  			cpumask_clear(allocated_mask);
> > +			goto retry;
> >  		}
> > 
> > -		cpumask_xor(available_mask, allocated_mask,
> > -			    cpumask_of_node(numa_node));
> > -
> >  		target_cpu = cpumask_first(available_mask);
> >  		cpumask_set_cpu(target_cpu, allocated_mask);
> > 
> > @@ -778,7 +785,6 @@ static void init_vp_index(struct vmbus_channel *channel)
> >  	}
> > 
> >  	channel->target_cpu = target_cpu;
> > -
> >  	free_cpumask_var(available_mask);
> >  }
> 
> Removing the blank line above is a gratuitous change that isn't needed.
> Generally, a patch should avoid such changes unless the purpose of
> the patch is code cleanup.
>
[sss] : Got in by mistake, will remove

> > 
> > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > index 714d549..23660a8 100644
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -21,6 +21,7 @@
> >  #include <linux/kernel_stat.h>
> >  #include <linux/clockchips.h>
> >  #include <linux/cpu.h>
> > +#include <linux/sched/isolation.h>
> >  #include <linux/sched/task_stack.h>
> > 
> >  #include <linux/delay.h>
> > @@ -1770,6 +1771,11 @@ static ssize_t target_cpu_store(struct vmbus_channel
> > *channel,
> >  	if (target_cpu >= nr_cpumask_bits)
> >  		return -EINVAL;
> > 
> > +	if (!cpumask_test_cpu(target_cpu, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ))) {
> > +		dev_err(&channel->device_obj->device,
> > +			"cpu (%d) is isolated, can't be assigned\n", target_cpu);
> 
> I don't think a message should be output here.  The other errors in this
> function don't output a message.  Generally, the kernel doesn't output
> a message just because a user provided bad input.  Doing so makes it
> too easy for a user (even a sysadmin) to cause the kernel to go wild
> outputting messages.
> 
> Michael
> 
[sss] : sure, will remove

> > +		return -EINVAL;
> > +	}
> >  	/* No CPUs should come up or down during this. */
> >  	cpus_read_lock();
> > 
> > --
> > 1.8.3.1