Hopefully someone can shed some light on how to proceed with solving an
LVM hang problem.
Yesterday I get an email that one of the drives did not pass the self-check.
In /var/log/messages I see these lines related to the drive issue:
================================================================================
Apr 7 17:57:03 grp-01-10-01 smartd[2444]: Device: /dev/hdm, FAILED
SMART self-check. BACK UP DATA NOW!
Apr 7 17:57:03 grp-01-10-01 smartd[2444]: Sending warning via mail to
root ...
Apr 7 17:57:03 grp-01-10-01 smartd[2444]: Warning via mail to root:
successful
Apr 7 18:05:51 grp-01-10-01 kernel: hdm: task_out_intr: status=0x51 {
DriveReady SeekComplete Error }
Apr 7 18:05:52 grp-01-10-01 kernel: hdm: task_out_intr: error=0x10 {
SectorIdNotFound }, LBAsect=238863867, high=14, low=3982843,
sector=238863903
Apr 7 18:05:52 grp-01-10-01 kernel: ide: failed opcode was: unknown
Apr 7 18:05:57 grp-01-10-01 kernel: hdm: task_out_intr: status=0x51 {
DriveReady SeekComplete Error }
Apr 7 18:06:00 grp-01-10-01 kernel: hdm: task_out_intr: error=0x10 {
SectorIdNotFound }, LBAsect=238814880, high=14, low=3933856,
sector=238814887
Apr 7 18:06:00 grp-01-10-01 kernel: ide: failed opcode was: unknown
^^^^^^^^ LOTS OF THESE LINES IN LOG ^^^^^^^^^
Apr 8 02:05:10 grp-01-10-01 kernel: raid1: hdm2: rescheduling sector
54264480
...
Apr 8 02:05:21 grp-01-10-01 kernel: raid1:md0: read error corrected (8
sectors at 54264480 on hdm2)
Apr 8 02:05:22 grp-01-10-01 kernel: raid1: hdc2: redirecting sector
54264480 to another mirror <<===== I DO NOT UNDERSTAND THIS MESSAGE.
THE FAILING DRIVE hdm IS THE OTHER MIRROR FOR hdc2 ????
...
Apr 8 03:02:35 grp-01-10-01 kernel: raid1: hdm2: rescheduling sector
30555792
...
Apr 8 03:02:36 grp-01-10-01 kernel: raid1: Disk failure on hdm2,
disabling device.
Apr 8 03:02:37 grp-01-10-01 kernel: Operation continuing on 1 devices
Apr 8 03:02:37 grp-01-10-01 kernel: raid1: hdc2: redirecting sector
30555792 to another mirror <<===== AND NOW THERE IS NO OTHER MIRROR !
Apr 8 03:02:37 grp-01-10-01 kernel: RAID1 conf printout:
Apr 8 03:02:37 grp-01-10-01 kernel: --- wd:1 rd:2
Apr 8 03:02:37 grp-01-10-01 kernel: disk 0, wo:0, o:1, dev:hdc2
Apr 8 03:02:37 grp-01-10-01 kernel: disk 1, wo:1, o:0, dev:hdm2
Apr 8 03:02:37 grp-01-10-01 kernel: RAID1 conf printout:
Apr 8 03:02:37 grp-01-10-01 kernel: --- wd:1 rd:2
Apr 8 03:02:37 grp-01-10-01 kernel: disk 0, wo:0, o:1, dev:hdc2
Apr 8 03:27:03 grp-01-10-01 smartd[2444]: Device: /dev/hdm, FAILED
SMART self-check. BACK UP DATA NOW!
Apr 8 03:27:03 grp-01-10-01 smartd[2444]: Device: /dev/hdm, 1 Currently
unreadable (pending) sectors
Apr 8 03:27:03 grp-01-10-01 smartd[2444]: Sending warning via mail to
root ...
^^^^^^^^ LOTS OF THESE LINES IN LOG ^^^^^^^^^
================================================================================
So I check /proc/mdstat and yes the md0 raid1 array shows only 1 active
drive, hdc2.
So I take a backup and then shutdown the system. I pull the bad drive
out and put in a new drive and reboot.
The system boots up until it gets to the LVM part and then just hangs at
this message:
================================================================================
...
Setting Hostname
Setting up Logical Volume Management (boot hangs right here, icon stops
spinning, cursor is locked)
================================================================================
So my setup consists of two Linux RAID arrays, a raid5 (md1) and a raid1
(md0) array.
The drive partition that went bad (hdm2) is part of md0 and another
partition (hdm1) also acts as a spare for md1.
There is an LVM VG over each array. So we have VolumeGroup00 and
VolumeGroup01.
How should I tackle this problem? I tried rescue mode but then there
are no VG's and I only see one of the arrays, md0.
????
Thanks,
Gerry
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/