On Tue, Jun 22, 2010 at 09:50:24PM +0200, Christian Brunner wrote: > > while running tests with qemu-io I've been experiencing a lot of > > messages when running a large writev request (several hundred MB in > > a single call): > > > > 10.06.20 22:10:07.337108 b67dcb70 client4136.objecter pg 3.437e on [0] is laggy: 33 > > 10.06.20 22:10:07.337708 b67dcb70 client4136.objecter pg 3.2553 on [0] is laggy: 19 > > [...] > > > > Everything is working fine, though. I think that the large number of > > queued requests is the cause for this behaviour and I would propose to > > delay futher requests (see attached patch). > > > > What do you think about it? > > It seems that the osd is lagging behind. The usleep might work for you > as you avoid the pressure, but it's also somewhat random and will > probably hurt performance on other setups. I'd rather see a > configurable solution that lets you specify a total in-flight bytes or > some other resizable window scheme. I'm not sure if I understand what "lagging behind" means. If the in-flight bytes are the sum of all requests in the queue, a solution could look like this (although it isn't configurable yet). Christian --- block/rbd.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index 10daf20..f87e84c 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -50,6 +50,7 @@ int eventfd(unsigned int initval, int flags); */ #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER) +#define MAX_QUEUE_SIZE 33554432 // 32MB typedef struct RBDAIOCB { BlockDriverAIOCB common; @@ -79,6 +80,7 @@ typedef struct BDRVRBDState { uint64_t size; uint64_t objsize; int qemu_aio_count; + uint64_t queuesize; } BDRVRBDState; typedef struct rbd_obj_header_ondisk RbdHeader1; @@ -334,6 +336,7 @@ static int rbd_open(BlockDriverState *bs, const char *filename, int flags) le64_to_cpus((uint64_t *) & header->image_size); s->size = header->image_size; s->objsize = 1 << header->options.order; + s->queuesize = 0; s->efd = eventfd(0, 0); if (s->efd < 0) { @@ -443,6 +446,7 @@ static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb) int i; acb->aiocnt--; + acb->s->queuesize -= rcb->segsize; r = rados_aio_get_return_value(c); rados_aio_release(c); if (acb->write) { @@ -560,6 +564,12 @@ static BlockDriverAIOCB *rbd_aio_rw_vector(BlockDriverState *bs, rcb->segsize = segsize; rcb->buf = buf; + while (s->queuesize > MAX_QUEUE_SIZE) { + usleep(100); + } + + s->queuesize += segsize; + if (write) { rados_aio_create_completion(rcb, NULL, (rados_callback_t) rbd_finish_aiocb, -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html