On Tue, Nov 15, 2022 at 1:27 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > *snip* > > Anyway, I'm aware of big cloud providers who are pretty happy with live > migration in production. I could see someone sufficiently cloudbrained deciding that rebooting the hypervisor is fine provided the downtime doesn't violate any customer uptime SLAs. Personally I'd only be brave enough to do that for a HV hosting internal services which I know are behind a load balancer, but apparently there are people at Huawei far braver than I. > *snip* > > > Adding 2K+ VFs to the sysfs need too much time. > > > > Look at the bottomhalf of the hypervisor live update: > > kexec --> add 2K VFs --> restore VMs > > > > The downtime can be reduced if the sequence is: > > kexec --> add 100 VFs(the VMs used) --> resotre VMs --> add 1.9K VFs > > Addition of VFs is serial operation, you can fire your VMs once you > counted 100 VFs in sysfs directory. I don't know if making that kind of assumption about the behaviour of sysfs is better or worse than just adding another knob. If at some point in the future the initialisation of VF pci_devs was moved to a workqueue or something we'd be violating that assumption without breaking any of the documented ABI. I guess you could argue that VFs being added sequentially is "ABI", but userspace has always been told not to make assumptions about when sysfs attributes (or nodes, I guess) appear since doing so is prone to races.