On Thu, May 10, 2018 at 2:07 PM, Laine Stump <laine@xxxxxxxxxx> wrote: > On 05/10/2018 02:53 PM, Ihar Hrachyshka wrote: >> Hi, >> >> In kubevirt, we discovered [1] that whenever e1000 is used for vNIC, >> link on the interface becomes ready several seconds after 'ifup' is >> executed > > What is your definition of "becomes ready"? Are you looking at the > output of "ip link show" in the guest? Or are you watching "brctl > showstp" for the bridge device on the host? Or something else? I was watching the guest dmesg for the following messages: [ 4.773275] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 6.769235] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 6.771408] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready For e1000, there are 2 seconds in between those messages; for virtio, it's near instant. Interesting that it happens on the very first ifup; when I do it the second time after the guest booted, it's instant. > >> which for some buggy images like cirros may slow down boot >> process for up to 1 minute [2]. If we switch from e1000 to virtio, the >> link is brought up and ready almost immediately. >> >> For the record, I am using the following versions: >> - L0 kernel: 4.16.5-200.fc27.x86_64 #1 SMP >> - libvirt: 3.7.0-4.fc27 >> - guest kernel: 4.4.0-28-generic #47-Ubuntu >> >> Is there something specific about e1000 that makes it initialize the >> link too slowly on libvirt or guest side? > > There isn't anything libvirt could do that would cause the link to > IFF_UP up any faster or slower, so if there is an issue it's elsewhere. > Since switching to the virtio device eliminates the problem, my guess > would be that it's something about the implementation of the emulated > device in qemu that is causing a delay in the e1000 driver in the guest. > That's just a guess though. > >> >> [1] https://github.com/kubevirt/kubevirt/issues/936 >> [2] https://bugs.launchpad.net/cirros/+bug/1768955 > > (I discount the idea of the stp delay timer having an effect, as > suggested in one of the comments on github that points to my explanation > of STP in a libvirt bugzilla record, because that would cause the same > problem for e1000 or virtio). Yes, it's not STP, and I also tried to explicitly set all bridge timers to 0 with no result. I also did "tcpdump -i any" inside the container that hosts the VM VIF, and there was no relevant traffic on tap device. > > I hesitate to suggest this, because the rtl8139 code in qemu is > considered less well maintained and lower performance than e1000, but > have you tried setting that model to see how it behaves? You may be > forced to make that the default when virtio isn't available. Indeed rth8139 is near instant too: [ 4.156872] 8139cp 0000:07:01.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1 [ 4.177520] 8139cp 0000:07:01.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1 Thanks for the tip, we will consider it too (also thanks for the background info about the driver support state). > > Another thought - I guess the virtio driver in Cirros is always > available? Perhaps kubevirt could use libosinfo to auto-decide what > device to use for networking based on OS. > This, or we can introduce explicit tags for NICs / guest type to use. Thanks a lot for reply, Ihar _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users