Re: [PATCH 4/5] Introcude VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT

Michal Privoznik <mprivozn@xxxxxxxxxx> · Thu, 14 Jun 2018 17:23:47 +0200

On 06/14/2018 04:10 PM, John Ferlan wrote:
> 
> [...]
> 
>>>> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
>>>> index da773b76cb..1a1d34620d 100644
>>>> --- a/include/libvirt/libvirt-domain.h
>>>> +++ b/include/libvirt/libvirt-domain.h
>>>> @@ -2055,6 +2055,7 @@ typedef enum {
>>>>      VIR_CONNECT_GET_ALL_DOMAINS_STATS_SHUTOFF = VIR_CONNECT_LIST_DOMAINS_SHUTOFF,
>>>>      VIR_CONNECT_GET_ALL_DOMAINS_STATS_OTHER = VIR_CONNECT_LIST_DOMAINS_OTHER,
>>>>  
>>>> +    VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT = 1 << 29, /* ignore stalled domains */
>>>
>>> "stalled"?  How about "don't wait on other jobs"
>>
>> Well, my hidden idea was also that we can "misuse" this flag to not wait
>> on other places too. For instance, if we find out (somehow) that a
>> domain is in D state, we would consider it stale even without any job
>> running on it. Okay, we have no way of detecting if qemu is in D state
>> right now, but you get my point. If we don't lock this flag down to just
>> domain jobs (that not all drivers have btw), we can use it more widely.
>>
> 
> Ahhh, I see what you're driving at - some flag name which allows us to
> reuse the name for other conditions which have caused delays in
> collection due to some underlying condition...
> 
> a/k/a: {AVOID|IGNORE}_BLOCKING_CONDITIONS...
> 
>>>
>>>>      VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING = 1 << 30, /* include backing chain for block stats */
>>>>      VIR_CONNECT_GET_ALL_DOMAINS_STATS_ENFORCE_STATS = 1U << 31, /* enforce requested stats */
> 
> Would the new flag be mutually exclusive to ENFORCE?  I want some list
> of stats, but don't wait to get the answer.

I thought about this and I don't think so. So current behaviour is: with
enforce you still might miss some stats if acquiring job fails. ENFORCE
merely tells libvirt to check if requested stats are available (=driver
knows how to fetch requested stats).
Therefore I don't think these two flags are mutually exclusive.

> 
>>>>  } virConnectGetAllDomainStatsFlags;
>>>> diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
>>>> index d44b553c74..906b097adb 100644
>>>> --- a/src/libvirt-domain.c
>>>> +++ b/src/libvirt-domain.c
>>>> @@ -11502,6 +11502,11 @@ virConnectGetDomainCapabilities(virConnectPtr conn,
>>>>   * fields for offline domains if the statistics are meaningful only for a
>>>>   * running domain.
>>>>   *
>>>> + * Passing VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT in
>>>> + * @flags means when libvirt is unable to fetch stats for any of
>>>> + * the domains (for whatever reason) such domain is silently
>>>> + * ignored.
>>>> + *
>>>
>>> So we're adding this for both capabilities and...
>>
>> Not really, this is jut git generating misleading diff (for human). In
>> fact this flag is added to virConnectGetAllDomainStats() and
>> virDomainListGetStats().
>>
>>>
>>>>   * Similarly to virConnectListAllDomains, @flags can contain various flags to
>>>>   * filter the list of domains to provide stats for.
>>>>   *
>>>> @@ -11586,6 +11591,11 @@ virConnectGetAllDomainStats(virConnectPtr conn,
>>>>   * fields for offline domains if the statistics are meaningful only for a
>>>>   * running domain.
>>>>   *
>>>> + * Passing VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT in
>>>> + * @flags means when libvirt is unable to fetch stats for any of
>>>> + * the domains (for whatever reason) such domain is silently
>>>> + * ignored.
>>>> + *
>>>
>>> ...stats in the API documentation, but...
>>>
>>> BTW: the domain isn't silently ignored, instead only a subset of
>>> statistics is returned for the domain.  That subset being statistics
>>> that don't involve querying the underlying hypervisor.
>>
>> OKay.
>>
>>>
>>>>   * Note that any of the domain list filtering flags in @flags may be rejected
>>>>   * by this function.
>>>>   *
>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>>> index 38ea865ce3..28338de05f 100644
>>>> --- a/src/qemu/qemu_driver.c
>>>> +++ b/src/qemu/qemu_driver.c
>>>> @@ -20446,6 +20446,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn,
>>>
>>> ...only adding the check in the DomainStats?
>>
>> Sure. Because what virDomainListGetStats does is it calls
>> driver->connectGetAllDomainStats() just like
>> virConnectGetAllDomainStats() does. So at the driver side there's only
>> one function serving two public APIs.
>>
> 
> The ... are all related - you've answered the concern above. I was too
> lazy to look at the actual placement - just took the git diff output and
> assumed Capabilities.
> 
>>>
>>>>      virCheckFlags(VIR_CONNECT_LIST_DOMAINS_FILTERS_ACTIVE |
>>>>                    VIR_CONNECT_LIST_DOMAINS_FILTERS_PERSISTENT |
>>>>                    VIR_CONNECT_LIST_DOMAINS_FILTERS_STATE |
>>>> +                  VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT |
>>>>                    VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING |
>>>>                    VIR_CONNECT_GET_ALL_DOMAINS_STATS_ENFORCE_STATS, -1);
>>>>  
>>>> @@ -20480,9 +20481,17 @@ qemuConnectGetAllDomainStats(virConnectPtr conn,
>>>>  
>>>>          virObjectLock(vm);
>>>>  
>>>> -        if (HAVE_JOB(privflags) &&
>>>> -            qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) == 0)
>>>> -            domflags |= QEMU_DOMAIN_STATS_HAVE_JOB;
>>>
>>> Existing, but if BeginJob fails for some "other" reason, then one
>>> wonders how much of the next steps actually happen.  
>>
>> Not sure what do you mean. Any StatsWorker callback (from
>> qemuDomainGetStats()) that needs to talk to monitor checks if grabbing
>> job succeeded or not. If it didn't the callback returns immediately. You
>> can see this live - just put sleep(60) right at the beginning of
>> qemuAgentGetTime(), and then from one console run 'virsh domtime $dom'
>> and from another one 'virsh domstats'.
>>
>>> Furthermore, we
>>> don't clear the LastError.
>>>
>>>> +        if (HAVE_JOB(privflags)) {
>>>> +            int rv;
>>>> +
>>>> +            if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT)
>>>> +                rv = qemuDomainObjBeginJobInstant(driver, vm, QEMU_JOB_QUERY);
>>>
>>> Let's assume rv = -1 for a moment and it's the last domain that caused
>>> the failure - does that mean the caller then re...
>>
>> unfinished sen... :-)
>>
> 
> Darn those interruptions - I think in the end this was all related to
> the clearing of the error message. I know I was concerned that for this
> avoid wait condition that we wouldn't message. For this case, it's
> handled by ignoring it, but perhaps some other future consumer would
> need to know it has to message that the call failed. It would then need
> to check if the last message was NULL, then generate a message.

Yup. But again, this is pre-existing and deserves a separate patch. I
can send it and (to motivate others to merge these) - I will do that
once these are in as a follow up ;-)

> 
> Of course then I got into the - hey wait, we don't clear the last error
> in the event that we're ignoring it and naturally neglected to go back
> to my first train of thought to complete my "tence" (;-)).

:-D

Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list