Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm

Zhi Yong Wu <zwu.kernel@xxxxxxxxx> · Tue, 18 Oct 2011 16:43:06 +0800



On Mon, Oct 17, 2011 at 6:26 PM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote:
> Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
>> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote:
>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>> Note:
>>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>>
>>>> For these problems, if you have nice thought, pls let us know.:)
>>>>
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx>
>>>> ---
>>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>  block.h |    1 -
>>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>>
>>> One general comment: What about synchronous and/or coroutine I/O
>>> operations? Do you think they are just not important enough to consider
>>> here or were they forgotten?
>> For sync ops, we assume that it will be converse into async mode at
>> some point of future, right?
>> For coroutine I/O, it is introduced in image driver layer, and behind
>> bdrv_aio_readv/writev. I think that we need not consider them, right?
>
> Meanwhile the block layer has been changed to handle all requests in
> terms of coroutines. So you would best move your intercepting code into
> the coroutine functions.
OK. I will.
>
>>> Also, do I understand correctly that you're always submitting the whole
>> Right, when the block timer fire, it will flush whole request queue.
>>> queue at once? Does this effectively enforce the limit all the time or
>>> will it lead to some peaks and then no requests at all for a while until
>> In fact, it only try to submit those enqueued request one by one. If
>> fail to pass the limit, this request will be enqueued again.
>
> Right, I missed this. Makes sense.
>
>>> the average is right again?
>> Yeah, it is possible. Do you better idea?
>>>
>>> Maybe some documentation on how it all works from a high level
>>> perspective would be helpful.
>>>
>>>> +    /* throttling disk read I/O */
>>>> +    if (bs->io_limits_enabled) {
>>>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>>>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>>>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>>>> +            printf("wait_time=%ld\n", wait_time);
>>>> +            if (wait_time != -1) {
>>>> +                printf("reset block timer\n");
>>>> +                qemu_mod_timer(bs->block_timer,
>>>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>>>> +            }
>>>> +
>>>> +            if (ret) {
>>>> +                printf("ori ret is not null\n");
>>>> +            } else {
>>>> +                printf("ori ret is null\n");
>>>> +            }
>>>> +
>>>> +            return ret;
>>>> +        }
>>>> +    }
>>>>
>>>> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>>                                 cb, opaque);
>>>> +    if (ret) {
>>>> +        if (bs->io_limits_enabled) {
>>>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>>>> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>>>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>>>> +        }
>>>
>>> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
>>> second counting mechanism. Would have the advantage that numbers are
>> NO, our counting variables will be reset to ZERO if current slice
>> time(0.1ms) is used up.
>
> Instead of setting the counter to zero you could remember the base value
> and calculate the difference when you need it. The advantage is that we
> can share infrastructure instead of introducing several subtly different
> ways of I/O accounting.
Good idea, let me try it.

>
>>> actually consistent (your metric counts slightly differently than the
>>> existing info blockstats one).
>> Yeah, i notice this, and don't think there's wrong with it. and you?
>
> It's not really user friendly if a number that is called the same means
> this in one place and in another place that.
OK
>
> Kevin
>


-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html