On 02.02.2018 15:15, Eduardo Habkost wrote: > On Fri, Feb 02, 2018 at 02:53:50PM +0100, Viktor Mihajlovski wrote: >> On 01.02.2018 21:26, Eduardo Habkost wrote: >>> On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: >>>> 2018-02-01 12:54-0500, Luiz Capitulino: >>>>> >>>>> Libvirt needs to know when a vCPU is halted. To get this information, >>>> >>>> I don't see why upper level management should care about that, a single >>>> bit about halted state that can be incorrect at the time it is processed >>>> seems of very limited use. >>> >>> I don't see why, either. >>> >>> I'm CCing libvir-list and the people involved in the code that >>> added halt state to libvirt domain statistics. >>> >> I'll try to explain the motivation for the "halted" state exposure and >> why it ended int the libvirt domain stats. >> >> s390 CPUs can be present in a system (e.g. after being hotplugged) but >> be offline (disabled) in which case they are not used by the operating >> system. In Linux disabled CPUs show a value of '0' in >> /sys/devices/system/cpu/cpu<n>/online. >> >> Higher level management software (on top of libvirt) can take advantage >> of knowing whether a guest CPU is online and thus used or not. >> Specifically it might not make sense to plug more CPUs if the guest OS >> isn't using the CPUs at all. > > Wasn't this already represented on "vcpu.<n>.state"? Why is > "vcpu.<n>.halted" needed? The state would match that of vcpuinfo, and there was consensus not to change it (on x86 the CPU is in state running, even if halted). > >> >> A disabled guest CPU is represented as halted in the QEMU object model >> and can therefore be identified by the QMP query-cpus command. >> >> The initial patch proposal to expose this via virsh vcpuinfo was not >> considered to be desirable because there was a concern that legacy >> management software might be confused seeing halted vcpus. Therefore the >> state information was added to the cpu domain statistics. >> >> One issue we're facing is that the semantics of "halted" are different >> between s390 and at least x86. The question might be whether they are >> different enough to grant a specific "disabled" indicator. > > From your description, it looks like they are completely > different. On x86, a CPU that is online and in use can be moved > between halted and non-halted state many times a second. > > If that's the case, we can probably fix this without breaking > existing code: explicitly documenting the semantics of > "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not > online" (i.e. the s390 semantics, not the x86 one), and making > qemuMonitorGetCpuHalted() s390-specific. > > Possibly a better long-term solution is to deprecate > "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on > s390> As it seems that nobody was ever *really* interested in x86.halted, one could also return 0 unconditionally there (and for other expensive-to-query arches)? > It would be also interesting to update QEMU QMP documentation to > clarify the arch-specific semantics of "halted". > -- Regards, Viktor Mihajlovski