On Sun, Mar 10, 2024 at 02:11:11PM -0400, Patrick Plenefisch wrote: > On Sun, Mar 10, 2024 at 11:27 AM Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > > > > On Sun, Mar 10 2024 at 7:34P -0400, > > Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > > On Sat, Mar 09, 2024 at 03:39:02PM -0500, Patrick Plenefisch wrote: > > > > On Wed, Mar 6, 2024 at 11:00 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > > > > > > > #!/usr/bin/bpftrace > > > > > > > > > > #ifndef BPFTRACE_HAVE_BTF > > > > > #include <linux/blkdev.h> > > > > > #endif > > > > > > > > > > kprobe:submit_bio_noacct, > > > > > kprobe:submit_bio > > > > > / (((struct bio *)arg0)->bi_opf & (1 << __REQ_PREFLUSH)) != 0 / > > > > > { > > > > > $bio = (struct bio *)arg0; > > > > > @submit_stack[arg0] = kstack; > > > > > @tracked[arg0] = 1; > > > > > } > > > > > > > > > > kprobe:bio_endio > > > > > /@tracked[arg0] != 0/ > > > > > { > > > > > $bio = (struct bio *)arg0; > > > > > > > > > > if (($bio->bi_flags & (1 << BIO_CHAIN)) && $bio->__bi_remaining.counter > 1) { > > > > > return; > > > > > } > > > > > > > > > > if ($bio->bi_status != 0) { > > > > > printf("dev %s bio failed %d, submitter %s completion %s\n", > > > > > $bio->bi_bdev->bd_disk->disk_name, > > > > > $bio->bi_status, @submit_stack[arg0], kstack); > > > > > } > > > > > delete(@submit_stack[arg0]); > > > > > delete(@tracked[arg0]); > > > > > } > > > > > > > > > > END { > > > > > clear(@submit_stack); > > > > > clear(@tracked); > > > > > } > > > > > > > > > > > > > Attaching 4 probes... > > > > dev dm-77 bio failed 10, submitter > > > > submit_bio_noacct+5 > > > > __send_duplicate_bios+358 > > > > __send_empty_flush+179 > > > > dm_submit_bio+857 > > > > __submit_bio+132 > > > > submit_bio_noacct_nocheck+345 > > > > write_all_supers+1718 > > > > btrfs_commit_transaction+2342 > > > > transaction_kthread+345 > > > > kthread+229 > > > > ret_from_fork+49 > > > > ret_from_fork_asm+27 > > > > completion > > > > bio_endio+5 > > > > dm_submit_bio+955 > > > > __submit_bio+132 > > > > submit_bio_noacct_nocheck+345 > > > > write_all_supers+1718 > > > > btrfs_commit_transaction+2342 > > > > transaction_kthread+345 > > > > kthread+229 > > > > ret_from_fork+49 > > > > ret_from_fork_asm+27 > > > > > > > > dev dm-86 bio failed 10, submitter > > > > submit_bio_noacct+5 > > > > write_all_supers+1718 > > > > btrfs_commit_transaction+2342 > > > > transaction_kthread+345 > > > > kthread+229 > > > > ret_from_fork+49 > > > > ret_from_fork_asm+27 > > > > completion > > > > bio_endio+5 > > > > clone_endio+295 > > > > clone_endio+295 > > > > process_one_work+369 > > > > worker_thread+635 > > > > kthread+229 > > > > ret_from_fork+49 > > > > ret_from_fork_asm+27 > > > > > > > > > > > > For context, dm-86 is /dev/lvm/brokenDisk and dm-77 is /dev/lowerVG/lvmPool > > > > > > io_status is 10(BLK_STS_IOERR), which is produced in submission code path on > > > /dev/dm-77(/dev/lowerVG/lvmPool) first, so looks it is one device mapper issue. > > > > > > The error should be from the following code only: > > > > > > static void __map_bio(struct bio *clone) > > > > > > ... > > > if (r == DM_MAPIO_KILL) > > > dm_io_dec_pending(io, BLK_STS_IOERR); > > > else > > > dm_io_dec_pending(io, BLK_STS_DM_REQUEUE); > > > break; > > > > I agree that the above bpf stack traces for dm-77 indicate that > > dm_submit_bio failed, which would end up in the above branch if the > > target's ->map() returned DM_MAPIO_KILL or DM_MAPIO_REQUEUE. > > > > But such an early failure speaks to the flush bio never being > > submitted to the underlying storage. No? > > > > dm-raid.c:raid_map does return DM_MAPIO_REQUEUE with: > > > > /* > > * If we're reshaping to add disk(s)), ti->len and > > * mddev->array_sectors will differ during the process > > * (ti->len > mddev->array_sectors), so we have to requeue > > * bios with addresses > mddev->array_sectors here or > > * there will occur accesses past EOD of the component > > * data images thus erroring the raid set. > > */ > > if (unlikely(bio_end_sector(bio) > mddev->array_sectors)) > > return DM_MAPIO_REQUEUE; > > > > But a flush doesn't have an end_sector (it'd be 0 afaik).. so it seems > > weird relative to a flush. > > > > > Patrick, you mentioned lvmPool is raid1, can you explain how lvmPool is > > > built? It is dm-raid1 target or over plain raid1 device which is > > > build over /dev/lowerVG? > > LVM raid1: > lvcreate --type raid1 -m 1 ... OK, that is the reason, as Mike mentioned. dm-raid.c:raid_map returns DM_MAPIO_REQUEUE, which is translated into BLK_STS_IOERR in dm_io_complete(). Empty flush bio is sent from btrfs, both .bi_size and .bi_sector are set as zero, but the top dm is linear, which(linear_map()) maps new bio->bi_iter.bi_sector, and the mapped bio is sent to dm-raid(raid_map()), then DM_MAPIO_REQUEUE is returned. The one-line patch I sent in last email should solve this issue. https://lore.kernel.org/dm-devel/a783e5ed-db56-4100-956a-353170b1b7ed@xxxxxxxxx/T/#m8fce3ecb2f98370b7d7ce8db6714bbf644af5459 But DM_MAPIO_REQUEUE misuse needs close look, and I believe Mike is working on that bigger problem. I guess most of dm targets don't deal with empty bio well, at least linear & dm-raid, not look into others yet, :-( Thanks, Ming