On Thu, Feb 29 2024 at 5:05P -0500, Goffredo Baroncelli <kreijack@xxxxxxxxx> wrote: > On 29/02/2024 21.22, Patrick Plenefisch wrote: > > On Thu, Feb 29, 2024 at 2:56 PM Goffredo Baroncelli <kreijack@xxxxxxxxx> wrote: > > > > > > > Your understanding is correct. The only thing that comes to my mind to > > > > cause the problem is asymmetry of the SATA devices. I have one 8TB > > > > device, plus a 1.5TB, 3TB, and 3TB drives. Doing math on the actual > > > > extents, lowerVG/single spans (3TB+3TB), and > > > > lowerVG/lvmPool/lvm/brokenDisk spans (3TB+1.5TB). Both obviously have > > > > the other leg of raid1 on the 8TB drive, but my thought was that the > > > > jump across the 1.5+3TB drive gap was at least "interesting" > > > > > > > > > what about lowerVG/works ? > > > > > > > That one is only on two disks, it doesn't span any gaps > > Sorry, but re-reading the original email I found something that I missed before: > > > BTRFS error (device dm-75): bdev /dev/mapper/lvm-brokenDisk errs: wr > > 0, rd 0, flush 1, corrupt 0, gen 0 > > BTRFS warning (device dm-75): chunk 13631488 missing 1 devices, max > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > tolerance is 0 for writable mount > > BTRFS: error (device dm-75) in write_all_supers:4379: errno=-5 IO > > failure (errors while submitting device barriers.) > > Looking at the code, it seems that if a FLUSH commands fails, btrfs > considers that the disk is missing. The it cannot mount RW the device. > > I would investigate with the LVM developers, if it properly passes > the flush/barrier command through all the layers, when we have an > lvm over lvm (raid1). The fact that the lvm is a raid1, is important because > a flush command to be honored has to be honored by all the > devices involved. Hi Patrick, Your initial report (start of this thread) mentioned that the regression occured with 5.19. The DM changes that landed during the 5.19 merge window refactored quite a bit of DM core's handling for bio splitting (to simplify DM's newfound support for bio polling) -- Ming Lei (now cc'd) and I wrote these changes: e86f2b005a51 dm: simplify basic targets bdb34759a0db dm: use bio_sectors in dm_aceept_partial_bio b992b40dfcc1 dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct e6926ad0c988 dm: pass dm_io instance to dm_io_acct directly d3de6d12694d dm: switch to bdev based IO accounting interfaces 7dd76d1feec7 dm: improve bio splitting and associated IO accounting 2e803cd99ba8 dm: don't grab target io reference in dm_zone_map_bio 0f14d60a023c dm: improve dm_io reference counting ec211631ae24 dm: put all polled dm_io instances into a single list 9d20653fe84e dm: simplify bio-based IO accounting further 4edadf6dcb54 dm: improve abnormal bio processing I'll have a closer look at these DM commits (especially relative to flush bios and your stacked device usage). The last commit (4edadf6dcb54) is marginally relevant (but likely most easily reverted from v5.19-rc2, as a simple test to see if it somehow a problem... doubtful to be cause but worth a try). (FYI, not relevant because it is specific to REQ_NOWAIT but figured I'd mention it, this commit earlier in the 5.19 DM changes was bogus: 563a225c9fd2 dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio Jens fixed it with this stable@ commit: a9ce385344f9 dm: don't attempt to queue IO under RCU protection) > > > However yes, I agree that the pair of disks involved may be the answer > > > of the problem. > > > > > > Could you show us the output of > > > > > > $ sudo pvdisplay -m > > > > > > > > > > I trimmed it, but kept the relevant bits (Free PE is thus not correct): > > > > > > --- Physical volume --- > > PV Name /dev/lowerVG/lvmPool > > VG Name lvm > > PV Size <3.00 TiB / not usable 3.00 MiB > > Allocatable yes > > PE Size 4.00 MiB > > Total PE 786431 > > Free PE 82943 > > Allocated PE 703488 > > PV UUID 7p3LSU-EAHd-xUg0-r9vT-Gzkf-tYFV-mvlU1M > > > > --- Physical Segments --- > > Physical extent 0 to 159999: > > Logical volume /dev/lvm/brokenDisk > > Logical extents 0 to 159999 > > Physical extent 160000 to 339199: > > Logical volume /dev/lvm/a > > Logical extents 0 to 179199 > > Physical extent 339200 to 349439: > > Logical volume /dev/lvm/brokenDisk > > Logical extents 160000 to 170239 > > Physical extent 349440 to 351999: > > FREE > > Physical extent 352000 to 460026: > > Logical volume /dev/lvm/brokenDisk > > Logical extents 416261 to 524287 > > Physical extent 460027 to 540409: > > FREE > > Physical extent 540410 to 786430: > > Logical volume /dev/lvm/brokenDisk > > Logical extents 170240 to 416260 Please provide the following from guest that activates /dev/lvm/brokenDisk: lsblk dmsetup table Please also provide the same from the host (just for completeness). Also, I didn't see any kernel logs that show DM-specific errors. I doubt you'd have left any DM-specific errors out in your report. So is btrfs the canary here? To be clear: You're only seeing btrfs errors in the kernel log? Mike