Re: LVM-on-LVM: error while submitting device barriers

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 6 Mar 2024 23:59:57 +0800

On Tue, Mar 05, 2024 at 12:45:13PM -0500, Mike Snitzer wrote:
> On Thu, Feb 29 2024 at  5:05P -0500,
> Goffredo Baroncelli <kreijack@xxxxxxxxx> wrote:
> 
> > On 29/02/2024 21.22, Patrick Plenefisch wrote:
> > > On Thu, Feb 29, 2024 at 2:56 PM Goffredo Baroncelli <kreijack@xxxxxxxxx> wrote:
> > > > 
> > > > > Your understanding is correct. The only thing that comes to my mind to
> > > > > cause the problem is asymmetry of the SATA devices. I have one 8TB
> > > > > device, plus a 1.5TB, 3TB, and 3TB drives. Doing math on the actual
> > > > > extents, lowerVG/single spans (3TB+3TB), and
> > > > > lowerVG/lvmPool/lvm/brokenDisk spans (3TB+1.5TB). Both obviously have
> > > > > the other leg of raid1 on the 8TB drive, but my thought was that the
> > > > > jump across the 1.5+3TB drive gap was at least "interesting"
> > > > 
> > > > 
> > > > what about lowerVG/works ?
> > > > 
> > > 
> > > That one is only on two disks, it doesn't span any gaps
> > 
> > Sorry, but re-reading the original email I found something that I missed before:
> > 
> > > BTRFS error (device dm-75): bdev /dev/mapper/lvm-brokenDisk errs: wr
> > > 0, rd 0, flush 1, corrupt 0, gen 0
> > > BTRFS warning (device dm-75): chunk 13631488 missing 1 devices, max
> >                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > tolerance is 0 for writable mount
> > > BTRFS: error (device dm-75) in write_all_supers:4379: errno=-5 IO
> > > failure (errors while submitting device barriers.)
> > 
> > Looking at the code, it seems that if a FLUSH commands fails, btrfs
> > considers that the disk is missing. The it cannot mount RW the device.
> > 
> > I would investigate with the LVM developers, if it properly passes
> > the flush/barrier command through all the layers, when we have an
> > lvm over lvm (raid1). The fact that the lvm is a raid1, is important because
> > a flush command to be honored has to be honored by all the
> > devices involved.

Hello Patrick & Goffredo,

I can trigger this kind of btrfs complaint by simulating one FLUSH failure.

If you can reproduce this issue easily, please collect log by the
following bpftrace script, which may show where the flush failure is,
and maybe it can help to narrow down the issue in the whole stack.

#!/usr/bin/bpftrace

#ifndef BPFTRACE_HAVE_BTF
#include <linux/blkdev.h>
#endif

kprobe:submit_bio_noacct,
kprobe:submit_bio
/ (((struct bio *)arg0)->bi_opf & (1 << __REQ_PREFLUSH)) != 0 /
{
	$bio = (struct bio *)arg0;
	@submit_stack[arg0] = kstack;
	@tracked[arg0] = 1;
}

kprobe:bio_endio
/@tracked[arg0] != 0/
{
	$bio = (struct bio *)arg0;

	if (($bio->bi_flags & (1 << BIO_CHAIN)) && $bio->__bi_remaining.counter > 1) {
		return;
	}

	if ($bio->bi_status != 0) {
		printf("dev %s bio failed %d, submitter %s completion %s\n",
			$bio->bi_bdev->bd_disk->disk_name,
			$bio->bi_status, @submit_stack[arg0], kstack);
	} 
	delete(@submit_stack[arg0]);
	delete(@tracked[arg0]);
}

END {
	clear(@submit_stack);
	clear(@tracked);
}

Thanks, 
Ming