Re: [RFC] Faster libvirtd restart with nwfilter rules

Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> · Mon, 1 Oct 2018 15:19:33 +0300



ping

On 24.09.2018 10:41, Nikolay Shirokovskiy wrote:
> Hi, all.                                                                                                               
>   
>   On fat hosts which are capable to run hundreds of VMs restarting libvirtd 
> makes it's services unavailable for a long time if VMs use network filters. In                                         
> my tests each of 100 VMs has no-promisc [1] and no-mac-spoofing filters and
> executing virsh list right after daemon restart takes appoximately 140s if no
> firewalld is running (that is ebtables/iptables/ip6tables commands are used to                                         
> configure kernel tables).                                                                                              
>   
>   The problem is daemon does not even start to read from client connections
> because state drivers are not initialized. Initialization is blocked in state                                          
> drivers autostart which grabs VMs locks. And VMs locks are hold by VMs
> reconnection code. Each VM reloads network tables on reconnection and this                                             
> reloading is serialized on updateMutex in gentech nwfilter driver.
> Workarounding autostart won't help much because even if state drivers will
> initialize listing VM won't be possible because listing VMs takes each VM lock                                         
> one by one too. However managing VM that passed reconnection phase will be                                             
> possible which takes same 140s in worst case.                                                                          
>   
>   Note that this issue is only applicable if we use filters configuration that                                         
> don't need ip learning. In the latter case situation is different because
> reconnection code spawns new thread that apply network rules only after ip is                                          
> learned from traffic and this thread does not grab VM lock. As result VMs are                                          
> managable but reloading filters in background takes appoximately those same
> 140s. I guess managing network filters during this period can have issues too.                                         
> Anyway this situation does not look good so fixing the described issue by                                              
> spawning threads even without ip learning does not look nice to me.                                                    
>   
>   What speed up is possible on conservative approach? First we can remove for                                          
> test purpuses firewall ruleLock, gentech dirver updateMutex and filter object                                          
> mutex which do not serve function in restart scenario. This gives 36s restart                                          
> time. The speed up is archived because heavy fork/preexec steps are now run                                            
> concurrently.
> 
> Next we can try to reduce fork/preexec time. To estimate its contibution alone
> let's bring back the above locks. It turns out the most time takes fork itself
> and closing 8k (on my system) file descriptors in preexec. Using vfork gives
> 2x boost and so does dropping mass close. (I check this mass close contribution
> because I not quite understand the purpose of this step - libvirt typically set
> close-on-exec flag on it's descriptors). So this two optimizations alone can
> result in restart time of 30s.
> 
> Unfortunately combining the above two approaches does not give boost multiple
> of them along. The reason is due to concurrency and high number of VMs (100)
> preexec boost does not have significant role and using vfork dininishes
> concurrency as it freezes all parent threads before execve. So dropping locks
> and closes gives 33s restart time and adding vfork to this gives 25s restart
> time.
> 
> Another approach is to use --atomic-file option for ebtables
> (iptables/ip6tables unfortunately does not have one). The idea is to save table
> to file/edit file/commit table to kernel. I hoped this could give performance
> boost because we don't need to load/store kernel network table for a single
> rule update. In order to isolate approaches I also dropped all ip/ip6 updates
> which can not be done this way. In this approach we can not drop ruleLock in
> firewall because no other VM threads should change tables between save/commit.
> This approach gives restart time 25s. But this approach is broken anyway as we
> can not be sure another application doesn't change newtork table between
> save/commit in which case these changes will be lost.
> 
> After all I think we need to move in a different direction. We can add API to
> all binaries and firewalld to execute many commands in one run. We can pass
> commands as arguments or wrote them into file which is then given to binary.
> Then libvirt itself can update for example bridge network table in couple of
> commands. The exact number depends on new API. For example if we add option to
> delete chains recursively and an option not to fail on NOENT error we can
> change table in one command (no listing current rules is required).
> 
> [1] no-promisc filter
> 
> <filter name='no-promisc' chain='root' priority='-750'>
>   <uuid>6d055022-1192-4a3d-ae1f-576baa5564b6</uuid>
>   <rule action='return' direction='in' priority='500'>
>     <mac dstmacaddr='ff:ff:ff:ff:ff:ff'/>
>   </rule>
>   <rule action='return' direction='in' priority='500'>
>     <mac dstmacaddr='$MAC'/>
>   </rule>
>   <rule action='return' direction='in' priority='500'>
>     <mac dstmacaddr='33:33:00:00:00:00' dstmacmask='ff:ff:00:00:00:00'/>
>   </rule>
>   <rule action='drop' direction='in' priority='500'>
>     <mac/>
>   </rule>
>   <rule action='return' direction='in' priority='500'>
>     <mac dstmacaddr='01:00:5e:00:00:00' dstmacmask='ff:ff:ff:80:00:00'/>
>   </rule>
> </filter>
> 
> --
> libvir-list mailing list
> libvir-list@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/libvir-list
> 

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list