On Mon, Oct 17, 2011 at 6:26 PM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote: > Am 26.09.2011 09:24, schrieb Zhi Yong Wu: >> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote: >>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu: >>>> Note: >>>> 1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. >>>> 2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits. >>>> >>>> For these problems, if you have nice thought, pls let us know.:) >>>> >>>> Signed-off-by: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx> >>>> --- >>>> block.c | 259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- >>>> block.h | 1 - >>>> 2 files changed, 248 insertions(+), 12 deletions(-) >>> >>> One general comment: What about synchronous and/or coroutine I/O >>> operations? Do you think they are just not important enough to consider >>> here or were they forgotten? >> For sync ops, we assume that it will be converse into async mode at >> some point of future, right? >> For coroutine I/O, it is introduced in image driver layer, and behind >> bdrv_aio_readv/writev. I think that we need not consider them, right? > > Meanwhile the block layer has been changed to handle all requests in > terms of coroutines. So you would best move your intercepting code into > the coroutine functions. OK. I will. > >>> Also, do I understand correctly that you're always submitting the whole >> Right, when the block timer fire, it will flush whole request queue. >>> queue at once? Does this effectively enforce the limit all the time or >>> will it lead to some peaks and then no requests at all for a while until >> In fact, it only try to submit those enqueued request one by one. If >> fail to pass the limit, this request will be enqueued again. > > Right, I missed this. Makes sense. > >>> the average is right again? >> Yeah, it is possible. Do you better idea? >>> >>> Maybe some documentation on how it all works from a high level >>> perspective would be helpful. >>> >>>> + /* throttling disk read I/O */ >>>> + if (bs->io_limits_enabled) { >>>> + if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) { >>>> + ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv, >>>> + sector_num, qiov, nb_sectors, cb, opaque); >>>> + printf("wait_time=%ld\n", wait_time); >>>> + if (wait_time != -1) { >>>> + printf("reset block timer\n"); >>>> + qemu_mod_timer(bs->block_timer, >>>> + wait_time + qemu_get_clock_ns(vm_clock)); >>>> + } >>>> + >>>> + if (ret) { >>>> + printf("ori ret is not null\n"); >>>> + } else { >>>> + printf("ori ret is null\n"); >>>> + } >>>> + >>>> + return ret; >>>> + } >>>> + } >>>> >>>> - return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors, >>>> + ret = drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors, >>>> cb, opaque); >>>> + if (ret) { >>>> + if (bs->io_limits_enabled) { >>>> + bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] += >>>> + (unsigned) nb_sectors * BDRV_SECTOR_SIZE; >>>> + bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++; >>>> + } >>> >>> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a >>> second counting mechanism. Would have the advantage that numbers are >> NO, our counting variables will be reset to ZERO if current slice >> time(0.1ms) is used up. > > Instead of setting the counter to zero you could remember the base value > and calculate the difference when you need it. The advantage is that we > can share infrastructure instead of introducing several subtly different > ways of I/O accounting. Good idea, let me try it. > >>> actually consistent (your metric counts slightly differently than the >>> existing info blockstats one). >> Yeah, i notice this, and don't think there's wrong with it. and you? > > It's not really user friendly if a number that is called the same means > this in one place and in another place that. OK > > Kevin > -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html