Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote:
>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>> Note:
>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>
>>> For these problems, if you have nice thought, pls let us know.:)
>>>
>>> Signed-off-by: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx>
>>> ---
>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>  block.h |    1 -
>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>
>> One general comment: What about synchronous and/or coroutine I/O
>> operations? Do you think they are just not important enough to consider
>> here or were they forgotten?
> For sync ops, we assume that it will be converse into async mode at
> some point of future, right?
> For coroutine I/O, it is introduced in image driver layer, and behind
> bdrv_aio_readv/writev. I think that we need not consider them, right?

Meanwhile the block layer has been changed to handle all requests in
terms of coroutines. So you would best move your intercepting code into
the coroutine functions.

>> Also, do I understand correctly that you're always submitting the whole
> Right, when the block timer fire, it will flush whole request queue.
>> queue at once? Does this effectively enforce the limit all the time or
>> will it lead to some peaks and then no requests at all for a while until
> In fact, it only try to submit those enqueued request one by one. If
> fail to pass the limit, this request will be enqueued again.

Right, I missed this. Makes sense.

>> the average is right again?
> Yeah, it is possible. Do you better idea?
>>
>> Maybe some documentation on how it all works from a high level
>> perspective would be helpful.
>>
>>> +    /* throttling disk read I/O */
>>> +    if (bs->io_limits_enabled) {
>>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>>> +            printf("wait_time=%ld\n", wait_time);
>>> +            if (wait_time != -1) {
>>> +                printf("reset block timer\n");
>>> +                qemu_mod_timer(bs->block_timer,
>>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>>> +            }
>>> +
>>> +            if (ret) {
>>> +                printf("ori ret is not null\n");
>>> +            } else {
>>> +                printf("ori ret is null\n");
>>> +            }
>>> +
>>> +            return ret;
>>> +        }
>>> +    }
>>>
>>> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>                                 cb, opaque);
>>> +    if (ret) {
>>> +        if (bs->io_limits_enabled) {
>>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>>> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>>> +        }
>>
>> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
>> second counting mechanism. Would have the advantage that numbers are
> NO, our counting variables will be reset to ZERO if current slice
> time(0.1ms) is used up.

Instead of setting the counter to zero you could remember the base value
and calculate the difference when you need it. The advantage is that we
can share infrastructure instead of introducing several subtly different
ways of I/O accounting.

>> actually consistent (your metric counts slightly differently than the
>> existing info blockstats one).
> Yeah, i notice this, and don't think there's wrong with it. and you?

It's not really user friendly if a number that is called the same means
this in one place and in another place that.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux