On 15/12/2023 02:27, Ming Lei wrote:
On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote:
Currently an IO size is limited to the request_queue limits max_sectors.
Limit the size for an atomic write to queue limit atomic_write_max_sectors
value.
Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
block/blk-merge.c | 12 +++++++++++-
block/blk.h | 3 +++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 0ccc251e22ff..8d4de9253fe9 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
{
unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
- unsigned max_sectors = lim->max_sectors, start, end;
+ unsigned max_sectors, start, end;
+
+ /*
+ * We ignore lim->max_sectors for atomic writes simply because
+ * it may less than bio->write_atomic_unit, which we cannot
+ * tolerate.
+ */
+ if (bio->bi_opf & REQ_ATOMIC)
+ max_sectors = lim->atomic_write_max_sectors;
+ else
+ max_sectors = lim->max_sectors;
Please note that I mentioned this issue in the cover letter, so you can
see some discussion there (if missed).
I can understand the trouble for write atomic from bio split, which
may simply split in the max_sectors boundary, however this change is
still too fragile:
1) ->max_sectors may be set from userspace
- so this change simply override userspace setting
yes
2) otherwise ->max_sectors is same with ->max_hw_sectors:
- then something must be wrong in device side or driver side because
->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed
to be figured out before device is setup
Right, so I think that it is proper to limit atomic_write_unit_max et al
to max_hw_sectors or whatever DMA engine device limits. I can make that
change.
3) too big max_sectors may break driver or device, such as nvme-pci
aligns max_hw_sectors with DMA optimized mapping size
And there might be more(better) choices:
1) make sure atomic write limit is respected when userspace updates
->max_sectors
My mind has been changed and I would say no and we can treat
atomic_write_unit_max as special. Indeed, max_sectors and
atomic_write_unit_max are complementary. If someone sets max_sectors to
4KB and then tries an atomic write of 16KB then they don't know what
they are doing.
My original idea was to dynamically limit atomic_unit_unit_max et al to
max_sectors (so that max_sectors is always respected for atomic writes).
As an alternative, how about we keep the value of atomic_unit_unit_max
static, but reject an atomic write if it exceeds max_sectors? It's not
too different than dynamically limiting atomic_unit_unit_max. But as
mentioned, it may be asking for trouble....
2) when driver finds that atomic write limits conflict with other
existed hardware limits, fail or solve(such as reduce write atomic unit) the
conflict before queue is started; With single write atomic limits update API,
the conflict can be figured out earlier by block layer too.
I think that we can do this, but I am not sure what other queue limits
may conflict (apart from max_sectors / max_sectors_hw). There is max
segment size, but we just rely on a single PAGE per iovec to evaluate
atomic_unit_unit_max, so should not be an issue.
Thanks,
John