On Wed, May 22, 2024 at 12:48:59PM -0400, Mike Snitzer wrote: > [ 74.872485] blk_insert_cloned_request: over max size limit. (2048 > 1024) > [ 74.872505] device-mapper: multipath: 254:3: Failing path 8:16. > [ 74.872620] blk_insert_cloned_request: over max size limit. (2048 > 1024) > [ 74.872641] device-mapper: multipath: 254:3: Failing path 8:32. > [ 74.872712] blk_insert_cloned_request: over max size limit. (2048 > 1024) > [ 74.872732] device-mapper: multipath: 254:3: Failing path 8:48. > [ 74.872788] blk_insert_cloned_request: over max size limit. (2048 > 1024) > [ 74.872808] device-mapper: multipath: 254:3: Failing path 8:64. > > Simply setting max_user_sectors won't help with stacked devices > because blk_stack_limits() doesn't stack max_user_sectors. It'll > inform the underlying device's blk_validate_limits() calculation which > will result in max_sectors having the desired value (which it already > did, as I showed above). But when stacking limits from underlying > devices up to the higher-level dm-mpath queue_limits we still have > information loss. So while I can't reproduce it, I think the main issue is that max_sectors really just is a voluntary limit, and enforcing that at the lower device doesn't really make any sense. So we could just check blk_insert_cloned_request to check max_hw_sectors instead. Or my below preferre variant to just drop the check, as the max_sectors == 0 check indicates it's pretty sketchy to start with. diff --git a/block/blk-mq.c b/block/blk-mq.c index fc364a226e952f..61b108aa20044d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3041,29 +3041,9 @@ void blk_mq_submit_bio(struct bio *bio) blk_status_t blk_insert_cloned_request(struct request *rq) { struct request_queue *q = rq->q; - unsigned int max_sectors = blk_queue_get_max_sectors(q, req_op(rq)); unsigned int max_segments = blk_rq_get_max_segments(rq); blk_status_t ret; - if (blk_rq_sectors(rq) > max_sectors) { - /* - * SCSI device does not have a good way to return if - * Write Same/Zero is actually supported. If a device rejects - * a non-read/write command (discard, write same,etc.) the - * low-level device driver will set the relevant queue limit to - * 0 to prevent blk-lib from issuing more of the offending - * operations. Commands queued prior to the queue limit being - * reset need to be completed with BLK_STS_NOTSUPP to avoid I/O - * errors being propagated to upper layers. - */ - if (max_sectors == 0) - return BLK_STS_NOTSUPP; - - printk(KERN_ERR "%s: over max size limit. (%u > %u)\n", - __func__, blk_rq_sectors(rq), max_sectors); - return BLK_STS_IOERR; - } - /* * The queue settings related to segment counting may differ from the * original queue. > > Mike ---end quoted text---