On Tue, Feb 07, 2017 at 10:15:06AM -0800, Ben Serebrin wrote: > From: Benjamin Serebrin <serebrin@xxxxxxxxxx> > > If the number of virtio queue pairs is not equal to the > number of VCPUs, the virtio guest driver doesn't assign > any CPU affinity for the queue interrupts or the xps > aggregation interrupt. (In contrast, the driver does assign > both if the counts of VCPUs and queues are equal, which is a good > default behavior.) > > Google Compute Engine currently provides 1 queue pair for > every VCPU, but limits that at a maximum of 32 queue pairs. > > This code extends the driver's default interrupt affinity > and transmit affinity settings for the case where there > are mismatching queue and VCPU counts. Userspace affinity > adjustment may always be needed to tune for a given workload. IIRC irqbalance will bail out and avoid touching affinity if you set affinity from driver. Breaking that's not nice. Pls correct me if I'm wrong. Generally, I wonder - we aren't the only device with a limited number of queues. > Tested: > > (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) > > Without the fix we see all queues affinitized to all CPUs: > > cd /proc/irq > for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done > 0-63 > [...] > 0-63 > > and we see all TX queues' xps_cpus affinitzed to no cores: > > for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done > 00000000,00000000 > [...] > 00000000,00000000 > > With the fix, we see each queue assigned to the a single core, > and xps affinity set to 1 unique cpu per TX queue. > > 64 VCPU: > > cd /proc/irq > for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done > > 0-63 > 0 > 0 > 1 > 1 > 2 > 2 > 3 > 3 > 4 > 4 > 5 > 5 > 6 > 6 > 7 > 7 > 8 > 8 > 9 > 9 > 10 > 10 > 11 > 11 > 12 > 12 > 13 > 13 > 14 > 14 > 15 > 15 > 16 > 16 > 17 > 17 > 18 > 18 > 19 > 19 > 20 > 20 > 21 > 21 > 22 > 22 > 23 > 23 > 24 > 24 > 25 > 25 > 26 > 26 > 27 > 27 > 28 > 28 > 29 > 29 > 30 > 30 > 31 > 31 > 0-63 > 0-63 > 0-63 > 0-63 > > cd /sys/class/net/eth0/queues > for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done > > 00000001,00000001 > 00000002,00000002 > 00000004,00000004 > 00000008,00000008 > 00000010,00000010 > 00000020,00000020 > 00000040,00000040 > 00000080,00000080 > 00000100,00000100 > 00000200,00000200 > 00000400,00000400 > 00000800,00000800 > 00001000,00001000 > 00002000,00002000 > 00004000,00004000 > 00008000,00008000 > 00010000,00010000 > 00020000,00020000 > 00040000,00040000 > 00080000,00080000 > 00100000,00100000 > 00200000,00200000 > 00400000,00400000 > 00800000,00800000 > 01000000,01000000 > 02000000,02000000 > 04000000,04000000 > 08000000,08000000 > 10000000,10000000 > 20000000,20000000 > 40000000,40000000 > 80000000,80000000 > > 48 VCPU: > > cd /proc/irq > for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done > 0-47 > 0 > 0 > 1 > 1 > 2 > 2 > 3 > 3 > 4 > 4 > 5 > 5 > 6 > 6 > 7 > 7 > 8 > 8 > 9 > 9 > 10 > 10 > 11 > 11 > 12 > 12 > 13 > 13 > 14 > 14 > 15 > 15 > 16 > 16 > 17 > 17 > 18 > 18 > 19 > 19 > 20 > 20 > 21 > 21 > 22 > 22 > 23 > 23 > 24 > 24 > 25 > 25 > 26 > 26 > 27 > 27 > 28 > 28 > 29 > 29 > 30 > 30 > 31 > 31 > 0-47 > 0-47 > 0-47 > 0-47 > > cd /sys/class/net/eth0/queues > for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done > > 0001,00000001 > 0002,00000002 > 0004,00000004 > 0008,00000008 > 0010,00000010 > 0020,00000020 > 0040,00000040 > 0080,00000080 > 0100,00000100 > 0200,00000200 > 0400,00000400 > 0800,00000800 > 1000,00001000 > 2000,00002000 > 4000,00004000 > 8000,00008000 > 0000,00010000 > 0000,00020000 > 0000,00040000 > 0000,00080000 > 0000,00100000 > 0000,00200000 > 0000,00400000 > 0000,00800000 > 0000,01000000 > 0000,02000000 > 0000,04000000 > 0000,08000000 > 0000,10000000 > 0000,20000000 > 0000,40000000 > 0000,80000000 > > Acked-by: Willem de Bruijn <willemb@xxxxxxxxxx> > Acked-by: Jim Mattson <jmattson@xxxxxxxxxx> > Acked-by: Venkatesh Srinivas <venkateshs@xxxxxxxxxx> > > Signed-off-by: Ben Serebrin <serebrin@xxxxxxxxxx> What happens if you have more than 1 virtio net device? > --- > drivers/net/virtio_net.c | 30 +++++++++++++++++++++++++++--- > 1 file changed, 27 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 765c2d6358da..0dc3a102bfc4 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info *vi) > * queue pairs, we let the queue pairs to be private to one cpu by > * setting the affinity hint to eliminate the contention. > */ > - if (vi->curr_queue_pairs == 1 || > - vi->max_queue_pairs != num_online_cpus()) { > + if (vi->curr_queue_pairs == 1) { > virtnet_clean_affinity(vi, -1); > return; > } > > + /* If there are more cpus than queues, then assign the queues' > + * interrupts to the first cpus until we run out. > + */ > i = 0; > for_each_online_cpu(cpu) { > + if (i == vi->max_queue_pairs) > + break; > virtqueue_set_affinity(vi->rq[i].vq, cpu); > virtqueue_set_affinity(vi->sq[i].vq, cpu); > - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); > i++; > } > > + /* Stripe the XPS affinities across the online CPUs. > + * Hyperthread pairs are typically assigned such that Linux's > + * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause > + * hyperthread twins to share TX queues, in the case where there are > + * more cpus than queues. Couldn't you add some kind of API so that we don't need to make assumptions like this? E.g. "give me a new CPU core to use for an interrupt"? Would address multiple device thing too. > + */ > + for (i = 0; i < vi->max_queue_pairs; i++) { > + struct cpumask mask; > + int skip = i; > + > + cpumask_clear(&mask); > + for_each_online_cpu(cpu) { > + while (skip--) > + cpu = cpumask_next(cpu, cpu_online_mask); > + if (cpu < num_possible_cpus()) > + cpumask_set_cpu(cpu, &mask); > + skip = vi->max_queue_pairs - 1; > + } > + netif_set_xps_queue(vi->dev, &mask, i); > + } > + Doesn't look like this will handle the case of num cpus < num queues well. > vi->affinity_hint_set = true; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html