On 07.11.21 11:21, Michael S. Tsirkin wrote: > On Sun, Nov 07, 2021 at 10:21:33AM +0100, David Hildenbrand wrote: >> Let's not focus on b), a) is the primary goal of this series: >> >> " >> a) Reduce the metadata overhead, including bitmap sizes inside KVM but >> also inside QEMU KVM code where possible. >> " >> >> Because: >> >> " >> For example, when starting a VM with a 1 TiB virtio-mem device that only >> exposes little device memory (e.g., 1 GiB) towards the VM initialliy, >> in order to hotplug more memory later, we waste a lot of memory on >> metadata for KVM memory slots (> 2 GiB!) and accompanied bitmaps. >> " >> >> Partially tackling b) is just a nice side effect of this series. In the >> long term, we'll want userfaultfd-based protection, and I'll do a >> performance evaluation then, how userfaultf vs. !userfaultfd compares >> (boot time, run time, THP consumption). >> >> I'll adjust the cover letter for the next version to make this clearer. > > So given this is short-term, and long term we'll use uffd possibly with > some extension (a syscall to populate 1G in one go?) isn't there some > way to hide this from management? It's a one way street: once we get > management involved in playing with memory slots we no longer can go > back and control them ourselves. Not to mention it's a lot of > complexity to push out to management. For b) userfaultfd + optimizatons is the way to go long term. For a) userfaultfd does not help in any way, and that's what I currently care about most. 1) For the management layer it will be as simple as providing a "memslots" parameter to the user. I don't expect management to do manual memslot detection+calculation -- management layer is the wrong place because it has limited insight. Either QEMU will do it automatically or the user will do it manually. For QEMU to do it reliably, we'll have to teach the management layer to specify any vhost* devices before virtio-mem* devices on the QEMU cmdline -- that is the only real complexity I see. 2) "control them ourselves" will essentially be enabled via "memslots=0" (auto-detect mode". The user has to opt in. "memslots" is a pure optimization mechanism. While I'd love to hide this complexity from user space and always use the auto-detect mode, especially hotplug of vhost devices is a real problem and requires users to opt-in. I assume once we have "memslots=0" (auto-detect) mode, most people will: * Set "memslots=0" to enable the optimization and essentially let QEMU control it. Will work in most cases and we can document perfectly where it won't. We'll always fail gracefully. * Leave "memslots=1" if they don't care about the optimization or run a problematic setup. * Set "memslots=X if they run a problemantic setup in still care about the optimization. To be precise, we could have a "memslots-optimiation=true|false" toggle instead. IMHO that could be limiting for these corner case setups where auto-detection is problematic and users still want to optimize -- especially eventually hotplugging vhost devices. But as I assume 99.9999% of all setups will enable auto-detect mode, I don't have a strong opinion. -- Thanks, David / dhildenb