RE: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus

KY Srinivasan <kys@xxxxxxxxxxxxx> · Mon, 27 Apr 2015 18:09:46 +0000

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> Sent: Monday, April 27, 2015 6:30 AM
> To: KY Srinivasan
> Cc: Dexuan Cui; Haiyang Zhang; devel@xxxxxxxxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among
> all vcpus
> 
> KY Srinivasan <kys@xxxxxxxxxxxxx> writes:
> 
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> >> Sent: Friday, April 24, 2015 2:05 AM
> >> To: Dexuan Cui
> >> Cc: KY Srinivasan; Haiyang Zhang; devel@xxxxxxxxxxxxxxxxxxxxxx; linux-
> >> kernel@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels
> among
> >> all vcpus
> >>
> >> Dexuan Cui <decui@xxxxxxxxxxxxx> writes:
> >>
> >> >> -----Original Message-----
> >> >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> >> >> Sent: Tuesday, April 21, 2015 22:28
> >> >> To: KY Srinivasan
> >> >> Cc: Haiyang Zhang; devel@xxxxxxxxxxxxxxxxxxxxxx; linux-
> >> >> kernel@xxxxxxxxxxxxxxx; Dexuan Cui
> >> >> Subject: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels
> among all
> >> >> vcpus
> >> >>
> >> >> Primary channels are distributed evenly across all vcpus we have.
> When
> >> the
> >> >> host asks us to create subchannels it usually makes us num_cpus-1
> offers
> >> >
> >> > Hi Vitaly,
> >> > AFAIK, in the VSP of storvsc, the number of subchannel is
> >> >  (the_number_of_vcpus - 1) / 4.
> >> >
> >> > This means for a 8-vCPU guest, there is only 1 subchannel.
> >> >
> >> > Your new algorithm tends to make the vCPUs with small-number busier:
> >> > e.g., in the 8-vCPU case, assuming we have 4 SCSI controllers:
> >> > vCPU0: scsi0's PrimaryChannel (P)
> >> > vCPU1: scsi0's SubChannel (S) + scsi1's P
> >> > vCPU2: scsi1's S + scsi2's P
> >> > vCPU3: scsi2's S + scsi3's P
> >> > vCPU4: scsi3's S
> >> > vCPU5, 6 and 7 are idle.
> >> >
> >> > In this special case, the existing algorithm is better. :-)
> >> >
> >> > However, I do like this idea in your patch, that is, making sure a device's
> >> > primary/sub channels are assigned to differents vCPUs.
> >>
> >> Under special circumstances with the current code we can end up with
> >> having all subchannels on the same vCPU with the primary channel I guess
> >> :-) This is not something common, but possible.
> >>
> >> >
> >> > I'm just wondering if we should use an even better (and complex)
> >> > algorithm :-)
> >>
> >> The question here is - does sticking to the current vCPU help? If it
> >> does, I can suggest the following (I think I even mentioned that in my
> >> PATCH 00): first we try to find a (sub)channel with target_cpu ==
> >> current_vcpu and only when we fail we do the round robin. I'd like to
> >> hear K.Y.'s opinion here as he's the original author :-)
> >
> > Sorry for the delayed response. Initially I had implemented a scheme that
> would
> > pick an outgoing CPU that was closest to the CPU on which the request
> came (to maintain
> > cache locality especially on NUMA systems). I changed this algorithm to
> spread the load
> > more uniformly as we were trying to improve Linux IOPS on Azure XIO
> > (premium storage). We are currently testing
> > this code on our Converged Offering - CPS and I am finding that the perf as
> measured by IOS has regressed.
> > I have not narrowed the reason for this regression and it may very well be
> the change in the
> > algorithm for selecting the outgoing channel. In general, I don't think the
> logic here needs to be
> > exact and locality (being on the same CPU or within the same NUMA node)
> is important. Any change
> > to this algorithm will have to be validated on different MSFT
> > environments (Azure XIO, CPS etc.).
> 
> Thanks, can you please compare two algorythms here:
> 1) Simple round robin (the one my patch series implement but with issues
> fixed, I'll send v2).
> 2) Try to find a (sub)channel with matching VCPU and round-robin when we
> fail (I can actually include it in v2).
> We can later decide something based on these testing results.

We will do some testing.

K. Y
> 
> >
> > Regards,
> >
> > K. Y
> 
> --
>   Vitaly
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel