Hello,
I've provisioned an LVM RAID 6 across 4 physical disks.
I'm trying to understand the RAID behavior after injecting the failure - removing physical disk /dev/sdc.
pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf
vgcreate pool_vg /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
lvcreate -l +100%FREE -n pool_lv --type raid6 pool_vg
mkfs.xfs /dev/pool_vg/pool_lv
echo "/dev/mapper/pool_vg-pool_lv /mnt xfs defaults,x-systemd.mount-timeout=30 0 0" >> /etc/fstab
vgcreate pool_vg /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
lvcreate -l +100%FREE -n pool_lv --type raid6 pool_vg
mkfs.xfs /dev/pool_vg/pool_lv
echo "/dev/mapper/pool_vg-pool_lv /mnt xfs defaults,x-systemd.mount-timeout=30 0 0" >> /etc/fstab
Everything appears to be working fine:
# pvs --segments -o pv_name,pv_size,seg_size,vg_name,lv_name,lv_attr,lv_size,seg_pe_ranges
PV PSize SSize VG LV Attr LSize PE Ranges
/dev/sda3 <49.00g <24.50g ubuntu-vg ubuntu-lv -wi-ao---- <24.50g /dev/sda3:0-6270
/dev/sda3 <49.00g 24.50g ubuntu-vg 0
/dev/sdc <100.00g 4.00m pool_vg [pool_lv_rmeta_0] ewi-aor--- 4.00m /dev/sdc:0-0
/dev/sdc <100.00g 99.99g pool_vg [pool_lv_rimage_0] iwi-aor--- 99.99g /dev/sdc:1-25598
/dev/sdd <100.00g 4.00m pool_vg [pool_lv_rmeta_1] ewi-aor--- 4.00m /dev/sdd:0-0
/dev/sdd <100.00g 99.99g pool_vg [pool_lv_rimage_1] iwi-aor--- 99.99g /dev/sdd:1-25598
/dev/sde <100.00g 4.00m pool_vg [pool_lv_rmeta_2] ewi-aor--- 4.00m /dev/sde:0-0
/dev/sde <100.00g 99.99g pool_vg [pool_lv_rimage_2] iwi-aor--- 99.99g /dev/sde:1-25598
/dev/sdf <100.00g 4.00m pool_vg [pool_lv_rmeta_3] ewi-aor--- 4.00m /dev/sdf:0-0
/dev/sdf <100.00g 99.99g pool_vg [pool_lv_rimage_3] iwi-aor--- 99.99g /dev/sdf:1-25598
/dev/sdg <100.00g 4.00m pool_vg [pool_lv_rmeta_4] ewi-aor--- 4.00m /dev/sdg:0-0
/dev/sdg <100.00g 99.99g pool_vg [pool_lv_rimage_4] iwi-aor--- 99.99g /dev/sdg:1-25598
PV PSize SSize VG LV Attr LSize PE Ranges
/dev/sda3 <49.00g <24.50g ubuntu-vg ubuntu-lv -wi-ao---- <24.50g /dev/sda3:0-6270
/dev/sda3 <49.00g 24.50g ubuntu-vg 0
/dev/sdc <100.00g 4.00m pool_vg [pool_lv_rmeta_0] ewi-aor--- 4.00m /dev/sdc:0-0
/dev/sdc <100.00g 99.99g pool_vg [pool_lv_rimage_0] iwi-aor--- 99.99g /dev/sdc:1-25598
/dev/sdd <100.00g 4.00m pool_vg [pool_lv_rmeta_1] ewi-aor--- 4.00m /dev/sdd:0-0
/dev/sdd <100.00g 99.99g pool_vg [pool_lv_rimage_1] iwi-aor--- 99.99g /dev/sdd:1-25598
/dev/sde <100.00g 4.00m pool_vg [pool_lv_rmeta_2] ewi-aor--- 4.00m /dev/sde:0-0
/dev/sde <100.00g 99.99g pool_vg [pool_lv_rimage_2] iwi-aor--- 99.99g /dev/sde:1-25598
/dev/sdf <100.00g 4.00m pool_vg [pool_lv_rmeta_3] ewi-aor--- 4.00m /dev/sdf:0-0
/dev/sdf <100.00g 99.99g pool_vg [pool_lv_rimage_3] iwi-aor--- 99.99g /dev/sdf:1-25598
/dev/sdg <100.00g 4.00m pool_vg [pool_lv_rmeta_4] ewi-aor--- 4.00m /dev/sdg:0-0
/dev/sdg <100.00g 99.99g pool_vg [pool_lv_rimage_4] iwi-aor--- 99.99g /dev/sdg:1-25598
# lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
LV Attr Cpy%Sync Health Devices
pool_lv rwi-aor--- 100.00 pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
[pool_lv_rimage_0] iwi-aor--- /dev/sdc(1)
[pool_lv_rimage_1] iwi-aor--- /dev/sdd(1)
[pool_lv_rimage_2] iwi-aor--- /dev/sde(1)
[pool_lv_rimage_3] iwi-aor--- /dev/sdf(1)
[pool_lv_rimage_4] iwi-aor--- /dev/sdg(1)
[pool_lv_rmeta_0] ewi-aor--- /dev/sdc(0)
[pool_lv_rmeta_1] ewi-aor--- /dev/sdd(0)
[pool_lv_rmeta_2] ewi-aor--- /dev/sde(0)
[pool_lv_rmeta_3] ewi-aor--- /dev/sdf(0)
[pool_lv_rmeta_4] ewi-aor--- /dev/sdg(0)
LV Attr Cpy%Sync Health Devices
pool_lv rwi-aor--- 100.00 pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
[pool_lv_rimage_0] iwi-aor--- /dev/sdc(1)
[pool_lv_rimage_1] iwi-aor--- /dev/sdd(1)
[pool_lv_rimage_2] iwi-aor--- /dev/sde(1)
[pool_lv_rimage_3] iwi-aor--- /dev/sdf(1)
[pool_lv_rimage_4] iwi-aor--- /dev/sdg(1)
[pool_lv_rmeta_0] ewi-aor--- /dev/sdc(0)
[pool_lv_rmeta_1] ewi-aor--- /dev/sdd(0)
[pool_lv_rmeta_2] ewi-aor--- /dev/sde(0)
[pool_lv_rmeta_3] ewi-aor--- /dev/sdf(0)
[pool_lv_rmeta_4] ewi-aor--- /dev/sdg(0)
After the /dev/sdc is removed and the system is rebooted, the RAID goes into "partial" health state and is no longer accessible.
# lvs -a -o name,lv_attr,copy_percent,health_status,devices pool_vg
WARNING: Couldn't find device with uuid 03KtEG-cJ5S-cMAD-RlL8-yBXM-jCav-EyD9I3.
WARNING: VG pool_vg is missing PV 03KtEG-cJ5S-cMAD-RlL8-yBXM-jCav-EyD9I3 (last written to /dev/sdc).
LV Attr Cpy%Sync Health Devices
pool_lv rwi---r-p- partial pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
[pool_lv_rimage_0] Iwi---r-p- partial [unknown](1)
[pool_lv_rimage_1] Iwi---r--- /dev/sdd(1)
[pool_lv_rimage_2] Iwi---r--- /dev/sde(1)
[pool_lv_rimage_3] Iwi---r--- /dev/sdf(1)
[pool_lv_rimage_4] Iwi---r--- /dev/sdg(1)
[pool_lv_rmeta_0] ewi---r-p- partial [unknown](0)
[pool_lv_rmeta_1] ewi---r--- /dev/sdd(0)
[pool_lv_rmeta_2] ewi---r--- /dev/sde(0)
[pool_lv_rmeta_3] ewi---r--- /dev/sdf(0)
[pool_lv_rmeta_4] ewi---r--- /dev/sdg(0)
WARNING: Couldn't find device with uuid 03KtEG-cJ5S-cMAD-RlL8-yBXM-jCav-EyD9I3.
WARNING: VG pool_vg is missing PV 03KtEG-cJ5S-cMAD-RlL8-yBXM-jCav-EyD9I3 (last written to /dev/sdc).
LV Attr Cpy%Sync Health Devices
pool_lv rwi---r-p- partial pool_lv_rimage_0(0),pool_lv_rimage_1(0),pool_lv_rimage_2(0),pool_lv_rimage_3(0),pool_lv_rimage_4(0)
[pool_lv_rimage_0] Iwi---r-p- partial [unknown](1)
[pool_lv_rimage_1] Iwi---r--- /dev/sdd(1)
[pool_lv_rimage_2] Iwi---r--- /dev/sde(1)
[pool_lv_rimage_3] Iwi---r--- /dev/sdf(1)
[pool_lv_rimage_4] Iwi---r--- /dev/sdg(1)
[pool_lv_rmeta_0] ewi---r-p- partial [unknown](0)
[pool_lv_rmeta_1] ewi---r--- /dev/sdd(0)
[pool_lv_rmeta_2] ewi---r--- /dev/sde(0)
[pool_lv_rmeta_3] ewi---r--- /dev/sdf(0)
[pool_lv_rmeta_4] ewi---r--- /dev/sdg(0)
From what I understand, the RAID should be able to continue with a physical disk loss and be in a "degraded" state, not "partial", because the data is fully present on the surviving disks.
From /etc/lvm/lvm.conf:
# degraded
# Like complete, but additionally RAID LVs of segment type raid1,
# raid4, raid5, radid6 and raid10 will be activated if there is no
# data loss, i.e. they have sufficient redundancy to present the
# entire addressable range of the Logical Volume.
# partial
# Allows the activation of any LV even if a missing or failed PV
# could cause data loss with a portion of the LV inaccessible.
# This setting should not normally be used, but may sometimes
# assist with data recovery.
# Like complete, but additionally RAID LVs of segment type raid1,
# raid4, raid5, radid6 and raid10 will be activated if there is no
# data loss, i.e. they have sufficient redundancy to present the
# entire addressable range of the Logical Volume.
# partial
# Allows the activation of any LV even if a missing or failed PV
# could cause data loss with a portion of the LV inaccessible.
# This setting should not normally be used, but may sometimes
# assist with data recovery.
"RAID is not like traditional LVM mirroring. LVM mirroring required
failed devices to be removed or the mirrored logical volume would hang.
RAID arrays can keep on running with failed devices. In fact, for RAID
types other than RAID1, removing a device would mean converting to a
lower level RAID (for example, from RAID6 to RAID5, or from RAID4 or
RAID5 to RAID0).
However, in my case, the RAID is not converted, it's simply not available.
This is running in a virtual machine on VMware ESXi 7, LVM version: 2.03.07(2) (2019-11-30).
Am I missing something obvious? Appreciate any insights.
Thanks,
Andrei
_______________________________________________ linux-lvm mailing list linux-lvm@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/