Hi Mike, Alasdair, Kiyoshi Ueda wrote: > On 09/17/2009 01:22 AM +0900, David Strand wrote: >> On Wed, Sep 16, 2009 at 8:34 AM, David Strand <dpstrand@xxxxxxxxx> wrote: >>> I am issuing 512 Kbyte reads through the device mapper device node to >>> a fibre channel disk. With 2.6.30 one read command for the entire 512 >>> Kbyte length is placed on the wire. With 2.6.31 this is being broken >>> up into 5 smaller read commands placed on the wire, decreasing >>> performance. >>> >>> This is especially penalizing on some disks where we have prefetch >>> turned off via the scsi mode page. Is there any easy way (through >>> configuration or sysfs) to restore the single read per i/o behavior >>> that I used to get? >> >> I should note that I am using dm-mpath, and the i/o is fragmented on >> the wire when using the device mapper device node but it is not >> fragmented when using one of the regular /dev/sd* device nodes for >> that device. > > David, > Thank you for reporting this. > I found on my test machine that max_sectors is set to SAFE_MAX_SECTORS, > which limits the I/O size small. > The attached patch fixes it. I guess the patch (and increasing > read-ahead size in /sys/block/dm-<n>/queue/read_ahead_kb) will solve > your fragmentation issue. Please try it. > > > Mike, Alasdair, > I found that max_sectors and max_hw_sectors of dm device are set > in smaller values than those of underlying devices. E.g: > # cat /sys/block/sdj/queue/max_sectors_kb > 512 > # cat /sys/block/sdj/queue/max_hw_sectors_kb > 32767 > # echo "0 10 linear /dev/sdj 0" | dmsetup create test > # cat /sys/block/dm-0/queue/max_sectors_kb > 127 > # cat /sys/block/dm-0/queue/max_hw_sectors_kb > 127 > This prevents the I/O size of struct request from becoming enough big > size, and causes undesired request fragmentation in request-based dm. > > This should be caused by the queue_limits stacking. > In dm_calculate_queue_limits(), the block-layer's small default size > is included in the merging process of target's queue_limits. > So underlying queue_limits is not propagated correctly. > > I think initializing default values of all max_* in '0' is an easy fix. > Do you think my patch is acceptable? > Any other idea to fix this problem? Well, sorry, we jumped the gun.. The patch should work fine for dm-multipath but setting '0' by default will cause problems on targets like 'zero' and 'error', which take no underlying device and use the default value. > blk_set_default_limits(limits); > + limits->max_sectors = 0; > + limits->max_hw_sectors = 0; So this should either set something very big (e.g. UINT_MAX) or set 0 by default but change to a certain safe value, if the end result of merging the limits is still 0. Attached is a revised patch with the latter approach. Please check this. If the approach is fine, I think we should bring this up to Jens whether to have these helpers in dm-table.c or move to block/blk-settings.c. Thanks, -- Jun'ichi Nomura, NEC Corporation max_sectors and max_hw_sectors of dm device are set to smaller values than those of underlying devices. E.g: # cat /sys/block/sdj/queue/max_sectors_kb 512 # cat /sys/block/sdj/queue/max_hw_sectors_kb 32767 # echo "0 10 linear /dev/sdj 0" | dmsetup create test # cat /sys/block/dm-0/queue/max_sectors_kb 127 # cat /sys/block/dm-0/queue/max_hw_sectors_kb 127 This prevents the I/O size of struct request from becoming large, and causes undesired request fragmentation in request-based dm. This is caused by the queue_limits stacking. In dm_calculate_queue_limits(), the block-layer's safe default value (SAFE_MAX_SECTORS, 255) is included in the merging process of target's queue_limits. So underlying queue_limits is not propagated correctly. Initialize default values of all max_*sectors to '0' and change the limits to SAFE_MAX_SECTORS only if the value is still '0' after merging. Signed-off-by: Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx> Signed-off-by: Jun'ichi Nomura <j-nomura@xxxxxxxxxxxxx> Cc: David Strand <dpstrand@xxxxxxxxx> Cc: Mike Snitzer <snitzer@xxxxxxxxxx>, Cc: Alasdair G Kergon <agk@xxxxxxxxxx> --- drivers/md/dm-table.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) Index: linux-2.6.31/drivers/md/dm-table.c =================================================================== --- linux-2.6.31.orig/drivers/md/dm-table.c +++ linux-2.6.31/drivers/md/dm-table.c @@ -647,6 +647,28 @@ int dm_split_args(int *argc, char ***arg } /* + * blk_stack_limits() chooses min_not_zero max_sectors value of underlying + * devices. So set the default to 0. + * Otherwise, the default SAFE_MAX_SECTORS dominates even if all underlying + * devices have max_sectors values larger than that. + */ +static void _set_default_limits_for_stacking(struct queue_limits *limits) +{ + blk_set_default_limits(limits); + limits->max_sectors = 0; + limits->max_hw_sectors = 0; +} + +/* If there's no underlying device, use the default value in blockdev. */ +static void _adjust_limits_for_stacking(struct queue_limits *limits) +{ + if (limits->max_sectors == 0) + limits->max_sectors = SAFE_MAX_SECTORS; + if (limits->max_hw_sectors == 0) + limits->max_hw_sectors = SAFE_MAX_SECTORS; +} + +/* * Impose necessary and sufficient conditions on a devices's table such * that any incoming bio which respects its logical_block_size can be * processed successfully. If it falls across the boundary between @@ -684,7 +706,7 @@ static int validate_hardware_logical_blo while (i < dm_table_get_num_targets(table)) { ti = dm_table_get_target(table, i++); - blk_set_default_limits(&ti_limits); + _set_default_limits_for_stacking(&ti_limits); /* combine all target devices' limits */ if (ti->type->iterate_devices) @@ -707,6 +729,8 @@ static int validate_hardware_logical_blo device_logical_block_size_sects - next_target_start : 0; } + _adjust_limits_for_stacking(limits); + if (remaining) { DMWARN("%s: table line %u (start sect %llu len %llu) " "not aligned to h/w logical block size %u", @@ -991,10 +1015,10 @@ int dm_calculate_queue_limits(struct dm_ struct queue_limits ti_limits; unsigned i = 0; - blk_set_default_limits(limits); + _set_default_limits_for_stacking(limits); while (i < dm_table_get_num_targets(table)) { - blk_set_default_limits(&ti_limits); + _set_default_limits_for_stacking(&ti_limits); ti = dm_table_get_target(table, i++); -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel