Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 24, 2017 at 1:38 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
>>>I just opened a tracker ticket for this [1]
>>>
>>>[1] http://tracker.ceph.com/issues/20070
>
> Thanks Jason!
>
>>>-- let me know if you have
>>>any other QEMU improvement ideas.
>
> For the moment, the bigger limitation is cpu usage of librbd,
> as qemu can only 1 thread, I can't reach more than around 70000 iops by disk.
> (3,1ghz cpu, disabling debug, rbd_cache, using jemalloc).
>
> So any improvement to reduce cpu usage could be great :)

We've had some interest expressed in volunteering time and code
contributions to reduce the CPU usage of librbd / librados. I am
hoping some of these improvements will be included in the Mimic
release.

> Also, in the future, I think qemu will support multiple iothreads by disk,
> I don't known if librbd is already ready for this ?

This is hopefully another improvement we can make in the Mimic release
timeframe. librbd already has an architecture to support multiple
internal IO threads, but we have it disabled for now due to lack of
thorough testing -- and for the fact that librados needs to utilize
multiple IO threads as well.

>
>
> ----- Mail original -----
> De: "Jason Dillaman" <jdillama@xxxxxxxxxx>
> À: "aderumier" <aderumier@xxxxxxxxx>
> Cc: "Ilya Dryomov" <idryomov@xxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "Christoph Hellwig" <hch@xxxxxx>, "Hannes Reinecke" <hare@xxxxxxxx>
> Envoyé: Mercredi 24 Mai 2017 13:53:40
> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>
> I just opened a tracker ticket for this [1] -- let me know if you have
> any other QEMU improvement ideas.
>
> [1] http://tracker.ceph.com/issues/20070
>
> On Wed, May 24, 2017 at 7:38 AM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
>> Hi,
>>
>> is it planned to implement write zeroes in qemu rbd block driver soon ?
>> (bdrv_co_write_zeroes)
>>
>> It's really missing currently, as qemu drive-mirror need it to have sparse images on copy.
>>
>> Ref from my discussion with Paolo from redhat in 2014 about this:
>> https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html
>>
>>
>> REgards,
>>
>> Alexandre
>>
>> ----- Mail original -----
>> De: "Jason Dillaman" <jdillama@xxxxxxxxxx>
>> À: "Ilya Dryomov" <idryomov@xxxxxxxxx>
>> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "Christoph Hellwig" <hch@xxxxxx>, "Hannes Reinecke" <hare@xxxxxxxx>
>> Envoyé: Mardi 23 Mai 2017 20:28:00
>> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES
>>
>> lgtm
>>
>> Reviewed-by: Jason Dillaman <dillaman@xxxxxxxxxx>
>>
>> On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>>> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
>>> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
>>> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
>>>
>>> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
>>> release either some or all blocks depending on whether the zeroing
>>> request is rbd_obj_bytes() aligned. This is how we currently implement
>>> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
>>> now. Caveats:
>>>
>>> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
>>> current implementations - nvme and loop
>>>
>>> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
>>> is hence less helpful than blk_bio_discard_split(), but this can (and
>>> should) be fixed on the rbd side
>>>
>>> In the future we will split these into two code paths to respect
>>> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
>>> released on discard.
>>>
>>> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
>>> Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>
>>> ---
>>> drivers/block/rbd.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 454bf9c34882..c16f74547804 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work)
>>>
>>> switch (req_op(rq)) {
>>> case REQ_OP_DISCARD:
>>> + case REQ_OP_WRITE_ZEROES:
>>> op_type = OBJ_OP_DISCARD;
>>> break;
>>> case REQ_OP_WRITE:
>>> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
>>> q->limits.discard_granularity = segment_size;
>>> q->limits.discard_alignment = segment_size;
>>> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE);
>>> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE);
>>>
>>> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
>>> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
>>> --
>>> 2.4.3
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason
>



-- 
Jason
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux