Re: RFC: Improve performance of macvtap device creation

Tony Krowiak <akrowiak@xxxxxxxxxxxxxxxxxx> · Fri, 30 Oct 2015 11:56:31 -0400

On 10/30/2015 06:49 AM, Michal Privoznik wrote:
On 29.10.2015 18:48, Laine Stump wrote:
On 10/29/2015 12:49 PM, Tony Krowiak wrote:
For a guest domain defined with a large number of macvtap devices, it
takes an exceedingly long time to boot the guest. In a test of a guest
domain configured with 82 macvtap devices, it took over two minutes
for the guest to boot. An strace of the ioctl calls during guest start
up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times.
I was able to isolate the source of the ioctl calls to
the*virNetDevMacVLanCreateWithVPortProfile*  function
in*virnetdevmacvlan.c*. The macvtap interface name is created by
looping over a counter variable, starting with zero, and appending the
counter value to 'macvtap'.
I've wondered ever since the first time I saw that code why it was done
that way, and why there had never been any performance complaints.
Lacking any complaints, I promptly forgot about it (until the next time
I went past the code for some other tangentially related reason.)

Since you're the first to complain, you have the honor of fixing it :-)

With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS
ioctl) to determine if a device with that name already exists, until a
unique name is created. In the test case cited above, to create an
interface name for the 82nd macvtap device, the*virNetDevExists*
function will be called for interface names 'macvtap0' to 'macvtap80'
before it is determined that 'mavtap81' can be used. So if N is the
number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS
ioctl will be invoked (N x N + N)/2 times to find an unused macvtap
device names. That's assuming only one guest is being started, who
knows how many times the ioctl may have to be called in an
installation running a large number of guests defined with macvtap
devices.
Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 (
contained v1.2.2~32) if two threads were starting a domain concurrently,
they even competed with each other in that specific area of the code.

I was able to reduce the amount of time for starting a guest domain
defined with 82 macvtap devices from over 2 minutes to about 14
seconds by keeping track of the interface name suffixes previously
used. I defined two static bit maps (virBitmap), one each for macvtap
and macvlan device name suffixes. When a macvtap/macvlan device is
created, the index of the next clear bit (virBitmapNextClearBit) is
retrieved to create the name. If an interface with that name does not
exist, the device is created and the bit at the index used to create
the interface name is set (virBitmapSetBit). When a macvtap/macvlan
device is deleted, if the interface name has the pattern 'macvtap%d'
or 'macvlan%d', the suffix is parsed into a bit index and used to
clear the (virBitMapClearBit) bit in the respective bitmap.
This sounds fine, as long as 1) you recreate the bitmap whenever
libvirtd is restarted (while scanning through all the interfaces of
every domain; there is already code being executed in exactly the right
place - look for qemu_process.c:qemuProcessNotifyNets() and add
appropriate code inside the loop there), and 2) you retry some number of
times if a supposedly unused device name is actually in use (to account
for processes other than libvirt using the same naming convention).
How about re-using the approach we have for virPortAllocator? We
maintain a bitmap of ports. On acquiring new port, we try to bind() to
it. If we succeeded, we set the corresponding bit in the bitmap. Of
course it may happen that a port in the host is already taken but our
bitmap does not think so. That's okay. We just leave the corresponding
bit alone => if we would set it as used, nobody will ever unset it.
Moreover, we will try the port next time, and it may be free.

Moreover, the bitmap is not saved anywhere, nor restored on daemon
restart - this could be changed though.

So what am I saying is practically the same as Laine, just extending his
thoughts and giving you an example how to proceed further :)
I appreciate the input. This is similar to the first solution I 
proposed, which I actually implemented and tested. It is described above.

Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list