Re: [PATCH v7 04/11] block: Introduce REQ_OP_ZONE_APPEND

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020/04/19 1:46, Bart Van Assche wrote:
> On 2020-04-17 05:15, Johannes Thumshirn wrote:
>> From: Keith Busch <kbusch@xxxxxxxxxx>
>>
>> Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned
>> block device. This is a no-merge write operation.
>>
>> A zone append write BIO must:
>> * Target a zoned block device
>> * Have a sector position indicating the start sector of the target zone
> 
> Why the start sector instead of any sector in the target zone? Wouldn't
> the latter make it easier to write software that uses REQ_OP_ZONE_APPEND?

We could do that, but we choose to have the interface match that of other zone
operations (e.g. REQ_OP_ZONE_RESET/OPEN/CLOSE/FINISH) which also require the
zone start sector.

>> * The target zone must be a sequential write zone
>> * The BIO must not cross a zone boundary
>> * The BIO size must not be split to ensure that a single range of LBAs
>>   is written with a single command.
> 
> "BIO size must" -> "BIO must"?
> 
>> diff --git a/block/bio.c b/block/bio.c
>> index 0f0e337e46b4..97baadc6d964 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -1006,7 +1006,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>>  				put_page(page);
>>  		} else {
>>  			if (WARN_ON_ONCE(bio_full(bio, len)))
>> -                                return -EINVAL;
>> +				 return -EINVAL;
>>  			__bio_add_page(bio, page, len, offset);
>>  		}
>>  		offset = 0;
> 
> Has the 'return' statement been indented correctly? I see a tab and a
> space in front of that statement instead of only a tab.
> 
>> @@ -1451,6 +1501,10 @@ struct bio *bio_split(struct bio *bio, int sectors,
>>  	BUG_ON(sectors <= 0);
>>  	BUG_ON(sectors >= bio_sectors(bio));
>>  
>> +	/* Zone append commands cannot be split */
>> +	if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
>> +		return NULL;
>> +
>>  	split = bio_clone_fast(bio, gfp, bs);
>>  	if (!split)
>>  		return NULL;
> 
> Zone append commands -> Zone append bio's?
> 
>> +/*
>> + * Check write append to a zoned block device.
>> + */
>> +static inline blk_status_t blk_check_zone_append(struct request_queue *q,
>> +						 struct bio *bio)
>> +{
>> +	sector_t pos = bio->bi_iter.bi_sector;
>> +	int nr_sectors = bio_sectors(bio);
>> +
>> +	/* Only applicable to zoned block devices */
>> +	if (!blk_queue_is_zoned(q))
>> +		return BLK_STS_NOTSUPP;
>> +
>> +	/* The bio sector must point to the start of a sequential zone */
>> +	if (pos & (blk_queue_zone_sectors(q) - 1) ||
>> +	    !blk_queue_zone_is_seq(q, pos))
>> +		return BLK_STS_IOERR;
>> +
>> +	/*
>> +	 * Not allowed to cross zone boundaries. Otherwise, the BIO will be
>> +	 * split and could result in non-contiguous sectors being written in
>> +	 * different zones.
>> +	 */
>> +	if (blk_queue_zone_no(q, pos) != blk_queue_zone_no(q, pos + nr_sectors))
>> +		return BLK_STS_IOERR;
> 
> Can the above statement be simplified into the following?
> 
> 	if (nr_sectors > q->limits.chunk_sectors)
> 		return BLK_STS_IOERR;

That would be equivalent only if the zone is empty. If the zone is not empty, we
need to check that the zone append request does not cross over to the next zone,
which would result in the BIO being split by the block layer.

> 
>> +	/* Make sure the BIO is small enough and will not get split */
>> +	if (nr_sectors > q->limits.max_zone_append_sectors)
>> +		return BLK_STS_IOERR;
> 
> Do we really need a new request queue limit parameter? In which cases
> will max_zone_append_sectors differ from the zone size?

Yes it can differ from the zone size. On real hardware, max_zone_append_sectors
will most of the time be equal to max_hw_sectors_kb. But it could be smaller
than that too. For the host, since a zone append is a write, it is also subject
to the max_sector_kb limit and cannot exceed that value.

> 
> Thanks,
> 
> Bart.
> 


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux