Re: vhost: multiple worker support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/3/21 5:13 AM, Stefan Hajnoczi wrote:
> On Tue, May 25, 2021 at 01:05:51PM -0500, Mike Christie wrote:
>> Results:
>> --------
>> When running with the null_blk driver and vhost-scsi I can get 1.2
>> million IOPs by just running a simple
>>
>> fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k --ioengine=libaio
>> --iodepth=128  --numjobs=8 --time_based --group_reporting --name=iops
>> --runtime=60 --eta-newline=1
>>
>> The VM has 8 vCPUs and sda has 8 virtqueues and we can do a total of
>> 1024 cmds per devices. To get 1.2 million IOPs I did have to tune and
>> ran the virsh emulatorpin command so the vhost threads were running
>> on different CPUs than the VM. If the vhost threads share CPUs then I
>> get around 800K.
>>
>> For a more real device that are also CPU hogs like iscsi, I can still
>> get 1 million IOPs using 1 dm-multipath device over 8 iscsi paths
>> (natively it gets 1.1 million IOPs).
> 
> There is no comparison against a baseline, but I guess it would be the
> same 8 vCPU guest with single queue vhost-scsi?
> 

For the iscsi device the max IOPs for the single thread case was around
380K IOPs.

Here are the results with null_blk as the backend device with a 16
vCPU guest to give you a better picture.

fio
numjobs 1        2        4        8        12       16
--------------------------------------------------------

Current upstream (single thread per vhost-scsi device).
After 8 jobs there was no perf diff.
********************************************************
VQs
1       130k     338k     390k     404k     -        -
2       146k     440k     448k     478k     -        -
4       146k     456k     448k     482k     -        -
8       154k     464k     500k     490k     -        -
12      160k     454k     486k     490k     -        -
16      162k     460k     484k     486k     -        -

thread per VQ:
After 16 jobs there was no perf diff even if I increased
the number of guest vCPUs.
*********************************************************
1	same as above
2       166k     320k     542k     664k     558k     658k
4       156k     310k     660k     986k     860k     890k
8       156k     328k     652k     988k     972k     1074k
12      162k     336k     660k     1172k    1190k    1324
16      162k     332k     664k     1398k    850k     1426k

Note:
- For numjobs > 8, I lowered iodepth so we had a total of 1024
cmds over all jobs.
- virtqueue_size/cmd_per_lun=1024 was used for all tests.
- If I modify vhost-scsi so vhost_scsi_handle_vq queues the
response immediately so we never enter the LIO/block/scsi layers
then I can get around 1.6-1.8M IOPs as the max.
- There are some device wide locks in the LIO main IO path that
we are hitting in these results. We are working on removing them.



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux