RE: Which (physical) disk is broken?

"Guy" <bugzilla@xxxxxxxxxxxxxxxx> · Fri, 24 Dec 2004 12:14:07 -0500

The info is in the superblock.  But if the disk has failed, you may not be
able to read the superblock.

Did you say you don't use superblocks?
I guess you better keep a paper trail!

You said:
"when persistent super blocks is used (which I don't)."

But the output from mdadm said:
"Persistence : Superblock is persistent"

Guy

-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Turbo Fredriksson
Sent: Friday, December 24, 2004 6:10 AM
To: linux-raid@xxxxxxxxxxxxxxx
Subject: Re: Which (physical) disk is broken?

>>>>> "Neil" == Neil Brown <neilb@xxxxxxxxxxxxxxx> writes:

    Neil> On Wednesday December 22, turbo@xxxxxxxxxx wrote:
    >> >>>>> "Guy" == Guy <bugzilla@xxxxxxxxxxxxxxxx> writes:
    >> 
    Guy> If you access your array, every disk in it will have disk
    Guy> activity.
    >>  He. That's one way I guess... I was more hoping for some
    >> support for this in mdadm...

    Neil> Would be nice....  but until disks have little blue lights
    Neil> that can be turned on and off under software control, and
    Neil> the linux block-device layer has an interface to access this
    Neil> control, there isn't much mdadm can usefully do.

I was more thinking on the 'magic'. When I created the array, I included
this broken disk. I later removed it, but there should still (?) be a
record of it in the super block (or wherever mdadm checks).

It knows how many disks there SHOULD be, and it knows how many is working.

----- s n i p -----
aurora:~# mdadm -D /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Wed Oct 27 08:12:44 2004
     Raid Level : raid5
     Array Size : 141483520 (134.93 GiB 144.88 GB)
    Device Size : 17685440 (16.87 GiB 18.11 GB)
   Raid Devices : 9
  Total Devices : 8
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Dec 24 11:45:11 2004
          State : clean, degraded
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 32K

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync
/dev/scsi/host3/bus0/target4/lun0/part1
       1       8       81        1      active sync
/dev/scsi/host3/bus0/target8/lun0/part1
       2       8       97        2      active sync
/dev/scsi/host3/bus0/target9/lun0/part1
       3       8      241        3      active sync
/dev/scsi/host4/bus0/target4/lun0/part1
       4      65        1        4      active sync
/dev/scsi/host4/bus0/target5/lun0/part1
       5      65       17        5      active sync
/dev/scsi/host4/bus0/target8/lun0/part1
       6      65       33        6      active sync
/dev/scsi/host4/bus0/target9/lun0/part1
       7      65      113        7      active sync
/dev/scsi/host4/bus0/target14/lun0/part1
       8       0        0       -1      removed
           UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
         Events : 0.1005733
----- s n i p -----

Maybe it's to late now, but can't mdadm write a 'tag' somehow that 'this
disk
have broken down' and/or 'this disk have been removed from the array'?

I don't know how the (software) RAID works on kernel/hardware level, only
how I, as a user/admin uses it :)

But if I check one of the disks scsi4:4:part1 for example:

----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target4/lun0/part1
/dev/scsi/host4/bus0/target4/lun0/part1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
  Creation Time : Wed Oct 27 08:12:44 2004
     Raid Level : raid5
    Device Size : 17685440 (16.87 GiB 18.11 GB)
   Raid Devices : 9
  Total Devices : 8
Preferred Minor : 1

    Update Time : Fri Dec 24 11:49:46 2004
          State : clean
 Active Devices : 8
Working Devices : 8
 Failed Devices : 1
  Spare Devices : 0
       Checksum : b038c0d3 - correct
         Events : 0.1005753

         Layout : left-symmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     3       8      241        3      active sync
/dev/scsi/host4/bus0/target4/lun0/part1
   0     0       8       49        0      active sync
/dev/scsi/host3/bus0/target4/lun0/part1
   1     1       8       81        1      active sync
/dev/scsi/host3/bus0/target8/lun0/part1
   2     2       8       97        2      active sync
/dev/scsi/host3/bus0/target9/lun0/part1
   3     3       8      241        3      active sync
/dev/scsi/host4/bus0/target4/lun0/part1
   4     4      65        1        4      active sync
/dev/scsi/host4/bus0/target5/lun0/part1
   5     5      65       17        5      active sync
/dev/scsi/host4/bus0/target8/lun0/part1
   6     6      65       33        6      active sync
/dev/scsi/host4/bus0/target9/lun0/part1
   7     7      65      113        7      active sync
/dev/scsi/host4/bus0/target14/lun0/part1
   8     8       0        0        8      faulty removed
----- s n i p -----

It (mdadm?) know exactly what md it belongs to etc... If I check
the disk that i THINK is the 'faulty removed' one, it 'knows nothing':

----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1
mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1
(Expected magic a92b4efc, got 00000000)
----- s n i p -----

So, IF this is the disk, then what did mdadm do when it was removed?
Did it clear the super block? Couldn't it write something else instead?
Or at least keep the UUID?

Hmm, looking trough the code (I do that freely, before someone throws
it at me :) i see that all mdadm do is a ioctl() so I guess it's something
that have to be done in the kernel (?)...

But how come mdadm knows that there's one removed? Common sence? There's
9 raid devices (how come it knows that?) and 8 total devices, hence one
must be/have been removed?

Looking at md.c in the kernel, I see that it don't write to the disk
(when removing a device from an array) when persistent super blocks
is used (which I don't). There don't seem to be an option in mdadm
for this (I do remember using it with raidtools long ago)..

Am I way of? It's early on christmas eve, and my blood to coffein ratio
is high... :)
-- 
Iran arrangements [Hello to all my fans in domestic surveillance]
Kennedy pits 767 Peking Clinton assassination Cocaine PLO Ft. Meade
killed $400 million in gold bullion kibo
[See http://www.aclu.org/echelonwatch/index.html for more about this]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html