> -----Original Message----- > From: Jakub Kicinski <kuba@xxxxxxxxxx> > Sent: Thursday, August 15, 2024 12:09 PM > To: Erni Sri Satya Vennela <ernis@xxxxxxxxxxxxxxxxxxx> > Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang > <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui > <decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; > pabeni@xxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx; Erni Sri Satya Vennela <ernis@xxxxxxxxxxxxx> > Subject: Re: [PATCH v2] net: netvsc: Update default VMBus channels > > On Wed, 14 Aug 2024 09:59:13 -0700 Erni Sri Satya Vennela wrote: > > Change VMBus channels macro (VRSS_CHANNEL_DEFAULT) in > > Linux netvsc from 8 to 16 to align with Azure Windows VM > > and improve networking throughput. > > > > For VMs having less than 16 vCPUS, the channels depend > > on number of vCPUs. Between 16 to 32 vCPUs, the channels > > default to VRSS_CHANNEL_DEFAULT. For greater than 32 vCPUs, > > set the channels to number of physical cores / 2 as a way > > to optimize CPU resource utilization and scale for high-end > > processors with many cores. > > Maximum number of channels are by default set to 64. > > > > Based on this change the subchannel creation would change as follows: > > > > ------------------------------------------------------------- > > |No. of vCPU |dev_info->num_chn |subchannel created | > > ------------------------------------------------------------- > > | 0-16 | 16 | vCPU | > > | >16 & <=32 | 16 | 16 | > > | >32 & <=128 | vCPU/2 | vCPU/2 | > > | >128 | vCPU/2 | 64 | > > ------------------------------------------------------------- > > > > Performance tests showed significant improvement in throughput: > > - 0.54% for 16 vCPUs > > - 0.83% for 32 vCPUs > > - 1.76% for 48 vCPUs > > - 10.35% for 64 vCPUs > > - 13.47% for 96 vCPUs > > Is there anything that needs clarifying in my feedback on v1? > > https://lore.kernel.org/all/20240807201857.445f9f95@xxxxxxxxxx/ > > Ignoring maintainer feedback is known to result in angry outbursts. Your suggestion on netif_get_num_default_rss_queues() is not ignored. We discussed internally on the formula we used for the num_chn, and chose a similar formula for higher number of vCPUs as in netif_get_num_default_rss_queues(). For lower number of vCPUs, we use the same default as Windows guests, because we don't want any potential regression. So, the end result is a step function: > > ------------------------------------------------------------- > > |No. of vCPU |dev_info->num_chn |subchannel created | > > ------------------------------------------------------------- > > | 0-16 | 16 | vCPU | > > | >16 & <=32 | 16 | 16 | > > | >32 & <=128 | vCPU/2 | vCPU/2 | > > | >128 | vCPU/2 | 64 | > > ------------------------------------------------------------- @Erni Sri Satya Vennela Next time, please reply to maintainer's emails to LKML, regarding how you think of the suggestions. Also, netif_get_num_default_rss_queues() uses #phys cores, which is different from num_online_cpus(). You can try like below, in addition to your comparison test, see if it's better than the patch v2. dev_info->num_chn = netif_get_num_default_rss_queues(); if (dev_info->num_chn < VRSS_CHANNEL_DEFAULT) dev_info->num_chn = VRSS_CHANNEL_DEFAULT; Thanks, - Haiyang