Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>I just opened a tracker ticket for this [1] 
>>
>>[1] http://tracker.ceph.com/issues/20070

Thanks Jason!

>>-- let me know if you have
>>any other QEMU improvement ideas.

For the moment, the bigger limitation is cpu usage of librbd, 
as qemu can only 1 thread, I can't reach more than around 70000 iops by disk.
(3,1ghz cpu, disabling debug, rbd_cache, using jemalloc).

So any improvement to reduce cpu usage could be great :)


Also, in the future, I think qemu will support multiple iothreads by disk, 
I don't known if librbd is already ready for this ?




----- Mail original -----
De: "Jason Dillaman" <jdillama@xxxxxxxxxx>
À: "aderumier" <aderumier@xxxxxxxxx>
Cc: "Ilya Dryomov" <idryomov@xxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "Christoph Hellwig" <hch@xxxxxx>, "Hannes Reinecke" <hare@xxxxxxxx>
Envoyé: Mercredi 24 Mai 2017 13:53:40
Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES

I just opened a tracker ticket for this [1] -- let me know if you have 
any other QEMU improvement ideas. 

[1] http://tracker.ceph.com/issues/20070 

On Wed, May 24, 2017 at 7:38 AM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote: 
> Hi, 
> 
> is it planned to implement write zeroes in qemu rbd block driver soon ? 
> (bdrv_co_write_zeroes) 
> 
> It's really missing currently, as qemu drive-mirror need it to have sparse images on copy. 
> 
> Ref from my discussion with Paolo from redhat in 2014 about this: 
> https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01274.html 
> 
> 
> REgards, 
> 
> Alexandre 
> 
> ----- Mail original ----- 
> De: "Jason Dillaman" <jdillama@xxxxxxxxxx> 
> À: "Ilya Dryomov" <idryomov@xxxxxxxxx> 
> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "Christoph Hellwig" <hch@xxxxxx>, "Hannes Reinecke" <hare@xxxxxxxx> 
> Envoyé: Mardi 23 Mai 2017 20:28:00 
> Objet: Re: [PATCH] rbd: implement REQ_OP_WRITE_ZEROES 
> 
> lgtm 
> 
> Reviewed-by: Jason Dillaman <dillaman@xxxxxxxxxx> 
> 
> On Tue, May 23, 2017 at 11:08 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: 
>> Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag") 
>> explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the 
>> following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data 
>> flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES. 
>> 
>> rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will 
>> release either some or all blocks depending on whether the zeroing 
>> request is rbd_obj_bytes() aligned. This is how we currently implement 
>> discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for 
>> now. Caveats: 
>> 
>> - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other 
>> current implementations - nvme and loop 
>> 
>> - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split() 
>> is hence less helpful than blk_bio_discard_split(), but this can (and 
>> should) be fixed on the rbd side 
>> 
>> In the future we will split these into two code paths to respect 
>> REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be 
>> released on discard. 
>> 
>> Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag") 
>> Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx> 
>> --- 
>> drivers/block/rbd.c | 2 ++ 
>> 1 file changed, 2 insertions(+) 
>> 
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c 
>> index 454bf9c34882..c16f74547804 100644 
>> --- a/drivers/block/rbd.c 
>> +++ b/drivers/block/rbd.c 
>> @@ -4023,6 +4023,7 @@ static void rbd_queue_workfn(struct work_struct *work) 
>> 
>> switch (req_op(rq)) { 
>> case REQ_OP_DISCARD: 
>> + case REQ_OP_WRITE_ZEROES: 
>> op_type = OBJ_OP_DISCARD; 
>> break; 
>> case REQ_OP_WRITE: 
>> @@ -4420,6 +4421,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev) 
>> q->limits.discard_granularity = segment_size; 
>> q->limits.discard_alignment = segment_size; 
>> blk_queue_max_discard_sectors(q, segment_size / SECTOR_SIZE); 
>> + blk_queue_max_write_zeroes_sectors(q, segment_size / SECTOR_SIZE); 
>> 
>> if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC)) 
>> q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; 
>> -- 
>> 2.4.3 
>> 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@xxxxxxxxxxxxxxx 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> -- 
> Jason 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 



-- 
Jason 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux