On Thu, 2022-05-26 at 14:01 +0200, Dario Faggioli wrote: > Thoughts? > Oh, and there are even a couple of other (potential) use case, for having an (even more!) fine grained control of core-scheduling. So, right now, giving a virtual topology to a VM, pretty much only makes sense if the VM has its vcpus pinned. Well, actually, there's something that we can do even if that is not the case, especially if we define at least *some* constraints on where the vcpus can run, even if we don't have strict and static 1-to-1 pinning... But for sure we shouldn't define an SMT topology, if we don't have that (i.e., if we don't have strict and static 1-to-1 pinning). And yet, the vcpus will run on cores and threads! Now, if we implement per-vcpu core-scheduling (which means being able to put not necessarily whole VMs, but single vcpus [although, of the same VM], in trusted groups), then we can: - put vcpu0 and vcpu1 of VM1 in a group - put vcpu2 and vcpu3 of VM1 in a(nother!) group - define, in the virtual topology of VM1, vcpu0 and vcpu1 as SMT-threads of the same core - define, in the virtual topology of VM1, vcpu2 and vcpu3 as SMT-threads of the same core From the perspective of the accuracy of the mapping between virtual and physical topology (and hence, most likely, of performance), it's still a mixed bag. I.e., on an idle or lightly loaded system, vcpu0 and vcpu1 can still run on two different cores. So, if the guest kernel and apps assume that the two vcpus are SMT-siblings, and optimize for that, well, that still might be false/wrong (like it would be without any core-scheduling, without any pinning, etc). At least, when they run on different cores, they run there alone, which is nice (but achievable with per-VM core-scheduling already). On a heavily loaded system, instead, vcpu0 and vcpu1 should (when they both want to run) have much higher chances of actually ending up running on the same core. [Of couse, not necessarily always on one same specific core --like when we do pinning-- but always on the one core.] So, in-guest workloads operating under the assumption that those two vcpus are SMT-siblings, will hopefully benefit from that. And for the lightly loaded case, well, I believe that combining per-vcpu core-scheduling + SMT virtual topology with *some* kind of vcpu affinity (and I mean something more flexible and less wasteful than 1-to-1 pinning, of course!) and/or with something like numad, will actually bring some performance and determinism benefits, even in such a scenario... But, of course, we need data for that, and I don't have any yet. :-) Anyway, let's now consider the case where the user/customer wants to be able use core-scheduling _inside_ of the guest, e.g., for protecting and/or shielding, some sensitive workload that he/she is running inside of the VM itself, from all the other tasks. But for using core- scheduling inside of the guest we need the guest to have cores and threads. And for the protection/shielding to be effective, we need to be sure that, say, if two guest tasks are in the same trusted group and are running on two vcpus that are virtual SMT-siblings, these two vcpus either (1) run on two actual physical SMT-siblings pCPUs on the host (i.e., on they run on the same core), or (2) run on different host cores, each one on a thread, with no other vCPU from any other VM (and no host task, for what matters) running on the other thread. And this is exactly what per-vcpu core-scheduling + SMT virtual topology gives us. :-D Of course, as in the previous message, I think that it's perfectly fine for something like this to not be implemented immediately, and come later. At least as far as we don't do anything at this stage that will prevent/make it difficult to implement such extensions in future. Which I guess is, after all, the main point of these very long emails (sorry!) that I am writing. I.e., _if_ we agree that it might be interesting to have per-VM, or even per-vcpu, core-scheduling in the future, let's just try to make sure that what we put together now (especially at the interface level) is easy to extend in that direction. :-) Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere)
Attachment:
signature.asc
Description: This is a digitally signed message part