Re: raid10 with missing redundancy, but health status claims it is ok.

Olaf Seibert <o.seibert@xxxxxxxxxxxx> · Mon, 30 May 2022 10:16:27 +0200

First, John, thanks for your reply.

On 28.05.22 18:15, John Stoffel wrote:
>>>>>> "Olaf" == Olaf Seibert <o.seibert@xxxxxxxxxxxx> writes:
> 
> I'm leaving for the rest of the weekend, but hopefully this will help you...
> 
> Olaf> Hi all, I'm new to this list. I hope somebody here can help me.
> 
> We will try!  But I would strongly urge that you take backups of all
> your data NOW, before you do anything else.  Copy to another disk
> which is seperate from this system just in case.

Unfortunately there are some complicating factors that I left out so far.
The machine in question is a host for virtual machines run by customers.
So we can't just even look at the data, never mind rsyncing it.
(the name "nova" might have given that away; that is the name of the 
OpenStack compute service)

> My next suggestion would be for you to provide the output of the
> 'pvs', 'vgs' and 'lvs' commands.   Also, which disk died?  And have
> you replaced it?    

/dev/sde died. It is still in the machine.

$ sudo pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sda2  system lvm2 a--  445.22g 347.95g
  /dev/sdb2  system lvm2 a--  445.22g 347.94g
  /dev/sdc1  nova   lvm2 a--    1.75t 412.19g
  /dev/sdd1  nova   lvm2 a--    1.75t   1.75t
  /dev/sdf1  nova   lvm2 a--    1.75t 812.25g
  /dev/sdg1  nova   lvm2 a--    1.75t   1.75t
  /dev/sdh1  nova   lvm2 a--    1.75t   1.75t
  /dev/sdi1  nova   lvm2 a--    1.75t 412.19g
  /dev/sdj1  nova   lvm2 a--    1.75t 412.19g

$ sudo vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  nova     7  20   0 wz--n-  12.23t   7.24t
  system   2   2   0 wz--n- 890.45g 695.89g

$ sudo lvs
  LV   VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log         Cpy%Sync Convert
  1b77 nova   Rwi-aor---  50.00g                                            100.00
  1c13 nova   Rwi-aor---  50.00g                                            100.00
  203f nova   Rwi-aor--- 800.00g                                            100.00
  3077 nova   Rwi-aor---  50.00g                                            100.00
  61a0 nova   Rwi-a-r---  50.00g                                            100.00
  63c1 nova   Rwi-aor---  50.00g                                            100.00
  8958 nova   Rwi-aor--- 800.00g                                            100.00
  8a4f nova   Rwi-aor---  50.00g                                            100.00
  965a nova   Rwi-aor--- 100.00g                                            100.00
  9d89 nova   Rwi-aor--- 200.00g                                            100.00
  9df4 nova   Rwi-a-r---  50.00g                                            100.00
  b41b nova   Rwi-aor---  50.00g                                            100.00
  c517 nova   Rwi-aor---  50.00g                                            100.00
  d36b nova   Rwi-aor---  50.00g                                            100.00
  dd1b nova   Rwi-a-r---  50.00g                                            100.00
  e2ed nova   Rwi-aor---  50.00g                                            100.00
  ef6c nova   Rwi-aor---  50.00g                                            100.00
  f5ce nova   Rwi-aor--- 100.00g                                            100.00
  f952 nova   Rwi-aor---  50.00g                                            100.00
  fbf6 nova   Rwi-aor---  50.00g                                            100.00
  boot system mwi-aom---   1.91g                                [boot_mlog] 100.00
  root system mwi-aom---  95.37g                                [root_mlog] 100.00

I am abbreviating the LV names since they are long boring UUIDs 
related to customer data. "203f" is "lvname", the LV which has problems.

> My second suggestion would be for you to use 'md' as the lower level
> RAID1/10/5/6 level underneath your LVM volumes.  Alot of people think
> it's better to have it all in one tool (btrfs, zfs, others) but I
> stronly feel that using nice layers helps keep things organized and
> reliable.
> 
> So if you can, add two new disks into your system, add a full-disk
> partition which starts at offset of 1mb or so, and maybe even leaves a
> couple of MBs of free space at the end, and then create an MD pair on
> them:

I am not sure if there are any free slots for more disks. We would need
to send somebody to the datacenter to put in any disks in any case.

I think I understand what you are getting at here, redundancy-wise.
But won't it confuse LVM? If it decides to store one side of any mirror on
this new md0, won't this result in 3 copies of the data for that volume?

In the list of commands I tried, there was this one:

> Olaf> $ sudo lvchange --resync nova/lvname
> Olaf>   WARNING: Not using lvmetad because a repair command was run.
> Olaf>   Logical volume nova/lvname in use.
> Olaf>   Can't resync open logical volume nova/lvname.

Any chance that this command might work, if we can ask the customer to
shut down their VM for a while? 

On the other hand, there were some other commands that took a while to run,
and therefore seemed to do something, but in the end they didn't.
It seems that this "error" segment (which seems to have replaced the bad disk)
is really confusing LVM. Such as `lvconvert --repair` which apparently
worked on the other LVs.

>    mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdy1 /dev/sdz1
>      
> Now you can add that disk in your nova VG with:
> 
>    vgextend nova /dev/md0
> 
> Then try to move your LV named 'lvname' onto the new MD PV.
> 
>    pvmove -n lvname /dev/<source_PV> /dev/md0
> 
> I think you really want to move the *entire* top level LV onto new
> storage.  Then you will know you have safe data.  And this can be done
> while the volume is up and running.
> 
> But again!!!!!!  Please take a backup (rsync onto a new LV maybe?) of
> your current data to make sure you don't lose anything.  
> 
> Olaf> We had a disk go bad (disk commands timed out and took many
> Olaf> seconds to do so) in our LVM installation with mirroring. With
> Olaf> some trouble, we managed to pvremove the offending disk, and
> Olaf> used `lvconvert --repair -y nova/$lv` to repair (restore
> Olaf> redundancy) the logical volumes.
> 
> How many disks do you have in the system?  Please don't try to hide
> names of disks and such unless you really need to.  It makes it much
> harder to diagnose.  

There are 10 disks (sda-j) of which sde is broken and no longer listed.

> Olaf> It seems like somehow we must convince LVM to allocate some space for
> Olaf> it, instead of using the error segment (there is plenty available in the
> Olaf> volume group).

Thanks,
-Olaf.

-- 
SysEleven GmbH
Boxhagener Straße 80
10245 Berlin

T +49 30 233 2012 0
F +49 30 616 7555 0

http://www.syseleven.de
http://www.facebook.com/SysEleven
https://www.instagram.com/syseleven/

Aktueller System-Status immer unter:
http://www.twitter.com/syseleven

Firmensitz: Berlin
Registergericht: AG Berlin Charlottenburg, HRB 108571 B
Geschäftsführer: Marc Korthaus, Jens Ihlenfeld, Andreas Hermann

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/