On Fri, Mar 22, 2024 at 11:30:33AM +0100, Mikulas Patocka wrote: > > > On Fri, 22 Mar 2024, Ming Lei wrote: > > > On Thu, Mar 21, 2024 at 05:48:45PM +0100, Mikulas Patocka wrote: > > > It may be possible to set up dm-integrity with smaller sector size than > > > the logical sector size of the underlying device. In this situation, > > > dm-integrity guarantees that the outgoing bios have the same alignment as > > > incoming bios (so, if you create a filesystem with 4k block size, > > > dm-integrity would send 4k-aligned bios to the underlying device). > > > > > > This guarantee was broken when integrity_recheck was implemented. > > > integrity_recheck sends bio that is aligned to ic->sectors_per_block. So > > > if we set up integrity with 512-byte sector size on a device with logical > > > block size 4k, we would be sending unaligned bio. This triggered a bug in > > > one of our internal tests. > > > > > > This commit fixes it - it determines what's the actual alignment of the > > > incoming bio and then makes sure that the outgoing bio in > > > integrity_recheck has the same alignment. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > Fixes: c88f5e553fe3 ("dm-integrity: recheck the integrity tag after a failure") > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > > --- > > > drivers/md/dm-integrity.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > Index: linux-2.6/drivers/md/dm-integrity.c > > > =================================================================== > > > --- linux-2.6.orig/drivers/md/dm-integrity.c 2024-03-21 14:25:45.000000000 +0100 > > > +++ linux-2.6/drivers/md/dm-integrity.c 2024-03-21 17:47:39.000000000 +0100 > > > @@ -1699,7 +1699,6 @@ static noinline void integrity_recheck(s > > > struct bio_vec bv; > > > sector_t sector, logical_sector, area, offset; > > > struct page *page; > > > - void *buffer; > > > > > > get_area_and_offset(ic, dio->range.logical_sector, &area, &offset); > > > dio->metadata_block = get_metadata_sector_and_offset(ic, area, offset, > > > @@ -1708,13 +1707,14 @@ static noinline void integrity_recheck(s > > > logical_sector = dio->range.logical_sector; > > > > > > page = mempool_alloc(&ic->recheck_pool, GFP_NOIO); > > > - buffer = page_to_virt(page); > > > > > > __bio_for_each_segment(bv, bio, iter, dio->bio_details.bi_iter) { > > > unsigned pos = 0; > > > > > > do { > > > + sector_t alignment; > > > char *mem; > > > + char *buffer = page_to_virt(page); > > > int r; > > > struct dm_io_request io_req; > > > struct dm_io_region io_loc; > > > @@ -1727,6 +1727,14 @@ static noinline void integrity_recheck(s > > > io_loc.sector = sector; > > > io_loc.count = ic->sectors_per_block; > > > > > > + /* Align the bio to logical block size */ > > > + alignment = dio->range.logical_sector | bio_sectors(bio) | (PAGE_SIZE >> SECTOR_SHIFT); > > > + alignment &= -alignment; > > > > The above is less readable, :-( > > It isolates the lowest bit from dio->range.logical_sector, > bio_sectors(bio) and (PAGE_SIZE >> SECTOR_SHIFT). > > See for example this https://www.felixcloutier.com/x86/blsi Fine, but I have to say such usage isn't popular. > > > > + io_loc.sector = round_down(io_loc.sector, alignment); > > > + io_loc.count += sector - io_loc.sector; > > > + buffer += (sector - io_loc.sector) << SECTOR_SHIFT; > > > + io_loc.count = round_up(io_loc.count, alignment); > > > > I feel the above code isn't very reliable, what we need actually is to > > make sure that io's sector & size is aligned with dm's > > bdev_logical_block_size(bdev). > > I thought about using bdev_logical_block_size. But it may be wrong if the > device stack is reconfigured. So, I concluded that taking the alignment > from the bio would be better. If logical block becomes mismatched by reconfiguration, the whole DM stack can't work: - at the beginning, DM is over NVMe(512 bs), DM & NVMe lbs is 512 - later, nvme is reconfigured and its lbs becomes 4k, but DM's lbs can't be updated - then unaligned IO is submitted to NVMe So DM _never_ works with mis-matched logical block size because of reconfigure, and same with MD. > > > Yeah, so far the max logical block size is 4k, but it may be increased > > in future and you can see the recent lsfmm proposal, so can we force it to be > > aligned with bdev_logical_block_size(bdev) here? > > > > Also can the above change work efficiently in case of 64K PAGE_SIZE? > > It doesn't work efficiently at all - this piece of code is only run in a > pathological case where the user writes into a buffer while reading it (or > when he reads multiple blocks into the same buffer), so I optimized it for > size, not for performance. > > But yes, it works with 64K PAGE_SIZE. Fine, but I still think PAGE_SIZE is hard to follow than logical block size. Thanks, Ming