On 2020/07/17 18:12, Ming Lei wrote: > On Fri, Jul 17, 2020 at 08:22:45AM +0000, Damien Le Moal wrote: >> On 2020/07/17 16:50, Ming Lei wrote: >>> On Fri, Jul 17, 2020 at 02:45:25AM +0000, Damien Le Moal wrote: >>>> On 2020/07/16 23:35, Christoph Hellwig wrote: >>>>> On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote: >>>>>> Max append sectors needs to be aligned to physical block size, otherwise >>>>>> we can end up in a situation where it's off by 1-3 sectors which would >>>>>> cause short writes with asynchronous zone append submissions from an FS. >>>>> >>>>> Huh? The physical block size is purely a hint. >>>> >>>> For ZBC/ZAC SMR drives, all writes must be aligned to the physical sector size. >>> >>> Then the physical block size should be same with logical block size. >>> The real workable limit for io request is aligned with logical block size. >> >> Yes, I know. This T10/T13 design is not the brightest thing they did... on 512e >> SMR drives, addressing is LBA=512B unit, but all writes in sequential zones must >> be 4K aligned (8 LBAs). > > Then the issue isn't related with zone append command only. Just wondering how this > special write block size alignment is enhanced in sequential zones. So far, write > from FS or raw block size is only logical block size aligned. This is not enforced in sd/sd_zbc.c. If the user issues a non 4K aligned request, it will get back an "unaligned write" error from the drive. zonefs and dm-zoned define a 4K block size to avoid that. For applications doing raw block device accesses, they have to issue properly aligned writes. > >> >>> >>>> However, sd/sd_zbc does not change max_hw_sectors_kb to ensure alignment to 4K >>>> on 512e disks. There is also nullblk which uses the default max_hw_sectors_kb to >>>> 255 x 512B sectors, which is not 4K aligned if the nullb device is created with >>>> 4K block size. >>> >>> Actually the real limit is from max_sectors_kb which is <= max_hw_sectors_kb, and >>> both should be aligned with logical block size, IMO. >> >> Yes, agreed, but for nullblk device created with block size = 4K it is not. So > > That is because the default magic number of BLK_SAFE_MAX_SECTORS. > >> one driver to patch for sure. However, I though having some forced alignment in >> blk_queue_max_hw_sectors() for limit->max_hw_sectors and limit->max_sectors >> would avoid tripping on weird values for weird drives... > > Maybe we can update it once the logical block size is available, such > as: > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 9a2c23cd9700..f9cbaadaa267 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -311,6 +311,14 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size) > } > EXPORT_SYMBOL(blk_queue_max_segment_size); > > +static unsigned blk_queue_round_sectors(struct request_queue *q, > + unsigned sectors) > +{ > + u64 bytes = sectors << 9; > + > + return (unsigned)round_down(bytes, queue_logical_block_size(q)); > +} > + > /** > * blk_queue_logical_block_size - set logical block size for the queue > * @q: the request queue for the device > @@ -330,6 +338,9 @@ void blk_queue_logical_block_size(struct request_queue *q, unsigned int size) > > if (q->limits.io_min < q->limits.physical_block_size) > q->limits.io_min = q->limits.physical_block_size; > + > + q->limits.max_sectors = blk_queue_round_sectors(q, > + q->limits.max_sectors) > } > EXPORT_SYMBOL(blk_queue_logical_block_size); Yes, something like this was what I had in mind so that 4Kn drives get a sensible value aligned to the 4K LBA, always. However, with the above, there is no guarantee that max_sectors is already set when the logical block size is set. I am not sure about the reverse either. So the rounding may need to be in both blk_queue_logical_block_size() and blk_queue_max_hw_sectors(). -- Damien Le Moal Western Digital Research