Re: [REGRESSION] LVM-on-LVM: error while submitting device barriers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/02/2024 20.37, Patrick Plenefisch wrote:
On Wed, Feb 28, 2024 at 2:19 PM Goffredo Baroncelli <kreijack@xxxxxxxxx> wrote:

On 28/02/2024 18.25, Patrick Plenefisch wrote:
I'm unsure if this is just an LVM bug, or a BTRFS+LVM interaction bug,
but LVM is definitely involved somehow.
Upgrading from 5.10 to 6.1, I noticed one of my filesystems was
read-only. In dmesg, I found:

BTRFS error (device dm-75): bdev /dev/mapper/lvm-brokenDisk errs: wr
0, rd 0, flush 1, corrupt 0, gen 0
BTRFS warning (device dm-75): chunk 13631488 missing 1 devices, max
tolerance is 0 for writable mount
BTRFS: error (device dm-75) in write_all_supers:4379: errno=-5 IO
failure (errors while submitting device barriers.)
BTRFS info (device dm-75: state E): forced readonly
BTRFS warning (device dm-75: state E): Skipping commit of aborted transaction.
BTRFS: error (device dm-75: state EA) in cleanup_transaction:1992:
errno=-5 IO failure

At first I suspected a btrfs error, but a scrub found no errors, and
it continued to be read-write on 5.10 kernels.

Here is my setup:

/dev/lvm/brokenDisk is a lvm-on-lvm volume. I have /dev/sd{a,b,c,d}
(of varying sizes) in a lower VG, which has three LVs, all raid1
volumes. Two of the volumes are further used as PV's for an upper VGs.
One of the upper VGs has no issues. The non-PV LV has no issue. The
remaining one, /dev/lowerVG/lvmPool, hosting nested LVM, is used as a
PV for VG "lvm", and has 3 volumes inside. Two of those volumes have
no issues (and are btrfs), but the last one is /dev/lvm/brokenDisk.
This volume is the only one that exhibits this behavior, so something
is special.

Or described as layers:
/dev/sd{a,b,c,d} => PV => VG "lowerVG"
/dev/lowerVG/single (RAID1 LV) => BTRFS, works fine
/dev/lowerVG/works (RAID1 LV) => PV => VG "workingUpper"
/dev/workingUpper/{a,b,c} => BTRFS, works fine
/dev/lowerVG/lvmPool (RAID1 LV) => PV => VG "lvm"
/dev/lvm/{a,b} => BTRFS, works fine
/dev/lvm/brokenDisk => BTRFS, Exhibits errors

I am a bit curious about the reasons of this setup.

The lowerVG is supposed to be a pool of storage for several VM's &
containers. [workingUpper] is for one VM, and [lvm] is for another VM.
However right now I'm still trying to organize the files directly
because I don't have all the VM's fully setup yet

However I understood that:

/dev/sda -+                +-- single (RAID1) -> ok             +-> a   ok
/dev/sdb  |                |                                    |-> b   ok
/dev/sdc  +--> [lowerVG]>--+-- works (RAID1) -> [workingUpper] -+-> c   ok
/dev/sdd -+                |
                             |                       +-> a          -> ok
                             +-- lvmPool (raid1)-> [lvm] ->-|
                                                     +-> b          -> ok
                                                     |
                                                     +->brokenDisk  -> fail

[xxx] means VG, the others are LVs that may act also as PV in
an upper VG

Note that lvmPool is also RAID1, but yes


So, it seems that

1) lowerVG/lvmPool/lvm/a
2) lowerVG/lvmPool/lvm/a
3) lowerVG/lvmPool/lvm/brokenDisk

are equivalent ... so I don't understand how 1) and 2) are fine but 3) is
problematic.

I assume you meant  lvm/b for 2?

Yes


Is my understanding of the LVM layouts correct ?

Your understanding is correct. The only thing that comes to my mind to
cause the problem is asymmetry of the SATA devices. I have one 8TB
device, plus a 1.5TB, 3TB, and 3TB drives. Doing math on the actual
extents, lowerVG/single spans (3TB+3TB), and
lowerVG/lvmPool/lvm/brokenDisk spans (3TB+1.5TB). Both obviously have
the other leg of raid1 on the 8TB drive, but my thought was that the
jump across the 1.5+3TB drive gap was at least "interesting"


what about lowerVG/works ?

However yes, I agree that the pair of disks involved may be the answer
of the problem.

Could you show us the output of

$ sudo pvdisplay -m





After some investigation, here is what I've found:

1. This regression was introduced in 5.19. 5.18 and earlier kernels I
can keep this filesystem rw and everything works as expected, while
5.19.0 and later the filesystem is immediately ro on any write
attempt. I couldn't build rc1, but I did confirm rc2 already has this
regression.
2. Passing /dev/lvm/brokenDisk to a KVM VM as /dev/vdb with an
unaffected kernel inside the vm exhibits the ro barrier problem on
unaffected kernels.

Is /dev/lvm/brokenDisk *always* problematic with affected ( >= 5.19 ) and
UNaffected ( < 5.19 ) kernel ?

Yes, I didn't test it in as much depth, but 5.15 and 6.1 in the VM
(and 6.1 on the host) are identically problematic


3. Passing /dev/lowerVG/lvmPool to a KVM VM as /dev/vdb with an
affected kernel inside the VM and using LVM inside the VM exhibits
correct behavior (I can keep the filesystem rw, no barrier errors on
host or guest)

Is /dev/lowerVG/lvmPool problematic with only "affected" kernel ?

Uh, passing lvmPool directly to the VM is never problematic. I tested
5.10 and 6.1 in the VM (and 6.1 on the host), and neither setup throws
barrier errors.

[...]

--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux