ping On 24.09.2018 10:41, Nikolay Shirokovskiy wrote: > Hi, all. > > On fat hosts which are capable to run hundreds of VMs restarting libvirtd > makes it's services unavailable for a long time if VMs use network filters. In > my tests each of 100 VMs has no-promisc [1] and no-mac-spoofing filters and > executing virsh list right after daemon restart takes appoximately 140s if no > firewalld is running (that is ebtables/iptables/ip6tables commands are used to > configure kernel tables). > > The problem is daemon does not even start to read from client connections > because state drivers are not initialized. Initialization is blocked in state > drivers autostart which grabs VMs locks. And VMs locks are hold by VMs > reconnection code. Each VM reloads network tables on reconnection and this > reloading is serialized on updateMutex in gentech nwfilter driver. > Workarounding autostart won't help much because even if state drivers will > initialize listing VM won't be possible because listing VMs takes each VM lock > one by one too. However managing VM that passed reconnection phase will be > possible which takes same 140s in worst case. > > Note that this issue is only applicable if we use filters configuration that > don't need ip learning. In the latter case situation is different because > reconnection code spawns new thread that apply network rules only after ip is > learned from traffic and this thread does not grab VM lock. As result VMs are > managable but reloading filters in background takes appoximately those same > 140s. I guess managing network filters during this period can have issues too. > Anyway this situation does not look good so fixing the described issue by > spawning threads even without ip learning does not look nice to me. > > What speed up is possible on conservative approach? First we can remove for > test purpuses firewall ruleLock, gentech dirver updateMutex and filter object > mutex which do not serve function in restart scenario. This gives 36s restart > time. The speed up is archived because heavy fork/preexec steps are now run > concurrently. > > Next we can try to reduce fork/preexec time. To estimate its contibution alone > let's bring back the above locks. It turns out the most time takes fork itself > and closing 8k (on my system) file descriptors in preexec. Using vfork gives > 2x boost and so does dropping mass close. (I check this mass close contribution > because I not quite understand the purpose of this step - libvirt typically set > close-on-exec flag on it's descriptors). So this two optimizations alone can > result in restart time of 30s. > > Unfortunately combining the above two approaches does not give boost multiple > of them along. The reason is due to concurrency and high number of VMs (100) > preexec boost does not have significant role and using vfork dininishes > concurrency as it freezes all parent threads before execve. So dropping locks > and closes gives 33s restart time and adding vfork to this gives 25s restart > time. > > Another approach is to use --atomic-file option for ebtables > (iptables/ip6tables unfortunately does not have one). The idea is to save table > to file/edit file/commit table to kernel. I hoped this could give performance > boost because we don't need to load/store kernel network table for a single > rule update. In order to isolate approaches I also dropped all ip/ip6 updates > which can not be done this way. In this approach we can not drop ruleLock in > firewall because no other VM threads should change tables between save/commit. > This approach gives restart time 25s. But this approach is broken anyway as we > can not be sure another application doesn't change newtork table between > save/commit in which case these changes will be lost. > > After all I think we need to move in a different direction. We can add API to > all binaries and firewalld to execute many commands in one run. We can pass > commands as arguments or wrote them into file which is then given to binary. > Then libvirt itself can update for example bridge network table in couple of > commands. The exact number depends on new API. For example if we add option to > delete chains recursively and an option not to fail on NOENT error we can > change table in one command (no listing current rules is required). > > [1] no-promisc filter > > <filter name='no-promisc' chain='root' priority='-750'> > <uuid>6d055022-1192-4a3d-ae1f-576baa5564b6</uuid> > <rule action='return' direction='in' priority='500'> > <mac dstmacaddr='ff:ff:ff:ff:ff:ff'/> > </rule> > <rule action='return' direction='in' priority='500'> > <mac dstmacaddr='$MAC'/> > </rule> > <rule action='return' direction='in' priority='500'> > <mac dstmacaddr='33:33:00:00:00:00' dstmacmask='ff:ff:00:00:00:00'/> > </rule> > <rule action='drop' direction='in' priority='500'> > <mac/> > </rule> > <rule action='return' direction='in' priority='500'> > <mac dstmacaddr='01:00:5e:00:00:00' dstmacmask='ff:ff:ff:80:00:00'/> > </rule> > </filter> > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list