Adding Martin. On 2020/05/11 22:23, Damien Le Moal wrote: > On 2020/05/11 20:46, Damien Le Moal wrote: >> On 2020/05/11 20:25, Hannes Reinecke wrote: >>> On 5/11/20 12:55 PM, Damien Le Moal wrote: >>>> On 2020/05/11 11:46, Damien Le Moal wrote: >>>>> Mike, >>>>> >>>>> I am still seeing the warning: >>>>> >>>>> [ 1827.839756] device-mapper: table: 253:1: adding target device sdj caused an >>>>> alignment inconsistency: physical_block_size=4096, logical_block_size=4096, >>>>> alignment_offset=0, start=0 >>>>> [ 1827.856738] device-mapper: table: 253:1: adding target device sdj caused an >>>>> alignment inconsistency: physical_block_size=4096, logical_block_size=4096, >>>>> alignment_offset=0, start=0 >>>>> [ 1827.874031] device-mapper: table: 253:1: adding target device sdj caused an >>>>> alignment inconsistency: physical_block_size=4096, logical_block_size=4096, >>>>> alignment_offset=0, start=0 >>>>> [ 1827.891086] device-mapper: table: 253:1: adding target device sdj caused an >>>>> alignment inconsistency: physical_block_size=4096, logical_block_size=4096, >>>>> alignment_offset=0, start=0 >>>>> >>>>> when mixing 512B sector and 4KB sector devices. Investigating now. >>>> >>>> >>>> OK. Figured that one out: the 500GB SSD I am using for the regular device is >>>> 976773168 512B sectors capacity, that is, not a multiple of the 256MB zone size, >>>> and not even a multiple of 4K. This causes the creation of a 12MB runt zone of >>>> 24624 sectors, which is ignored. But the start sector of the second device in >>>> the dm-table remains 976773168, so not aligned on 4K. This causes >>>> bdev_stack_limits to return an error and the above messages to be printed. >>>> >>>> So I think we need to completely ignore the eventual "runt" zone of the regular >>>> device so that everything aligns correctly. This will need changes in both >>>> dmzadm and dm-zoned. >>>> >>>> Hannes, I can hack something on top of your series. Or can you resend with that >>>> fixed ? >>>> >>>> >>>> >>>> >>> Does this one help? >> >> Nope. Same warning. >> >>> >>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c >>> index ea43f6892ced..5daca82b5ec7 100644 >>> --- a/drivers/md/dm-zoned-target.c >>> +++ b/drivers/md/dm-zoned-target.c >>> @@ -1041,13 +1041,17 @@ static int dmz_iterate_devices(struct dm_target *ti, >>> { >>> struct dmz_target *dmz = ti->private; >>> unsigned int zone_nr_sectors = dmz_zone_nr_sectors(dmz->metadata); >>> + unsigned int nr_zones; >>> sector_t capacity; >>> int r; >>> >>> - capacity = dmz->dev[0].capacity & ~(zone_nr_sectors - 1); >>> + nr_zones = DIV_ROUND_DOWN(dmz->dev[0].capacity, zone_nr_sectors); >>> + capacity = nr_zones * zone_nr_sectors; >> >> capacity = round_down(dmz->dev[0].capacity, zone_nr_sectors); >> >> is simpler :) >> >> In any case, your change does seem to do anything here. Before and after, the >> capacity is rounded down to full zones, excluding the last runt zone. I think it >> is to do with the table entry start offset given on DM start by dmzadm... >> >> >>> r = fn(ti, dmz->ddev[0], 0, capacity, data); >>> if (!r && dmz->ddev[1]) { >>> - capacity = dmz->dev[1].capacity & ~(zone_nr_sectors - 1); >>> + nr_zones = DIV_ROUND_DOWN(dmz->dev[1.capacity, >>> + zone_nr_sectors)); >>> + capacity = nr_zones * zone_nr_sectors; >>> r = fn(ti, dmz->ddev[1], 0, capacity, data); >>> } >>> return r; >>> >>> Cheers, > > The failure of bdev_stack_limits() generating the warning is due to the io_opt > limit not being compatible with the physical blocks size... Nothing to do with > zone start/runt zones. > > The problem is here: > > /* Optimal I/O a multiple of the physical block size? */ > if (t->io_opt & (t->physical_block_size - 1)) { > t->io_opt = 0; > t->misaligned = 1; > ret = -1; > } > > For the ssd (t), I have io_opt at 512 and the physical block size at 4096, > changed due to the satcking from the device real 512 phys block to the smr disk > phys block. The SMR disk io_opt is 0, so the ssd io_opt remains unchaged at 512. > And we end up with the misaligned trigger since 512 & 4095 = 512... > > I do not understand clearly yet... I wonder why the io_opt for the SMR drive is > not 4096, same as the physical sector size. I investigated this and here is what I found out: When the dual drive setup is started, dm_calculate_queue_limits() is called and ti->type->iterate_devices(ti, dm_set_device_limits, &ti_limits) executed. In dm-zoned, the iterate device method executes dm_set_device_limits() twice, once for each drive of the setup. The drives I am using are an M.2 SSD with a phys sector size of 512B and an optimal IO size set to 512. The SMR drive has a phys sector size of 4K and the optimal IO size set to 0, the drive does not report any value, not uncommon for HDDs. The result of bdev_stack_limits() execution from dm_set_device_limits() gives a DM device phys sector of 4K, no surprise. The io_opt limit though end up being 512 = lcm(0, 512). That results in this condition to trigger: /* Optimal I/O a multiple of the physical block size? */ if (t->io_opt & (t->physical_block_size - 1)) { t->io_opt = 0; t->misaligned = 1; ret = -1; } since 512 & (4096 - 1) is not 0... It looks to me like we should either have io_opt always be at least the phys sector size, or change the limit stacking io_opt handling. I am not sure which approach is best... Thoughts ? Martin, Any idea why the io_opt limit is not set to the physical block size when the drive does not report an optimal transfer length ? Would it be bad to set that value instead of leaving it to 0 ? -- Damien Le Moal Western Digital Research -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel