The info is in the superblock. But if the disk has failed, you may not be able to read the superblock. Did you say you don't use superblocks? I guess you better keep a paper trail! You said: "when persistent super blocks is used (which I don't)." But the output from mdadm said: "Persistence : Superblock is persistent" Guy -----Original Message----- From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Turbo Fredriksson Sent: Friday, December 24, 2004 6:10 AM To: linux-raid@xxxxxxxxxxxxxxx Subject: Re: Which (physical) disk is broken? >>>>> "Neil" == Neil Brown <neilb@xxxxxxxxxxxxxxx> writes: Neil> On Wednesday December 22, turbo@xxxxxxxxxx wrote: >> >>>>> "Guy" == Guy <bugzilla@xxxxxxxxxxxxxxxx> writes: >> Guy> If you access your array, every disk in it will have disk Guy> activity. >> He. That's one way I guess... I was more hoping for some >> support for this in mdadm... Neil> Would be nice.... but until disks have little blue lights Neil> that can be turned on and off under software control, and Neil> the linux block-device layer has an interface to access this Neil> control, there isn't much mdadm can usefully do. I was more thinking on the 'magic'. When I created the array, I included this broken disk. I later removed it, but there should still (?) be a record of it in the super block (or wherever mdadm checks). It knows how many disks there SHOULD be, and it knows how many is working. ----- s n i p ----- aurora:~# mdadm -D /dev/md1 /dev/md1: Version : 00.90.01 Creation Time : Wed Oct 27 08:12:44 2004 Raid Level : raid5 Array Size : 141483520 (134.93 GiB 144.88 GB) Device Size : 17685440 (16.87 GiB 18.11 GB) Raid Devices : 9 Total Devices : 8 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Dec 24 11:45:11 2004 State : clean, degraded Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1 1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1 2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1 5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1 6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1 7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1 8 0 0 -1 removed UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b Events : 0.1005733 ----- s n i p ----- Maybe it's to late now, but can't mdadm write a 'tag' somehow that 'this disk have broken down' and/or 'this disk have been removed from the array'? I don't know how the (software) RAID works on kernel/hardware level, only how I, as a user/admin uses it :) But if I check one of the disks scsi4:4:part1 for example: ----- s n i p ----- aurora:~# mdadm -E /dev/scsi/host4/bus0/target4/lun0/part1 /dev/scsi/host4/bus0/target4/lun0/part1: Magic : a92b4efc Version : 00.90.00 UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b Creation Time : Wed Oct 27 08:12:44 2004 Raid Level : raid5 Device Size : 17685440 (16.87 GiB 18.11 GB) Raid Devices : 9 Total Devices : 8 Preferred Minor : 1 Update Time : Fri Dec 24 11:49:46 2004 State : clean Active Devices : 8 Working Devices : 8 Failed Devices : 1 Spare Devices : 0 Checksum : b038c0d3 - correct Events : 0.1005753 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State this 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 0 0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1 1 1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1 2 2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1 3 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 4 4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1 5 5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1 6 6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1 7 7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1 8 8 0 0 8 faulty removed ----- s n i p ----- It (mdadm?) know exactly what md it belongs to etc... If I check the disk that i THINK is the 'faulty removed' one, it 'knows nothing': ----- s n i p ----- aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1 mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1 (Expected magic a92b4efc, got 00000000) ----- s n i p ----- So, IF this is the disk, then what did mdadm do when it was removed? Did it clear the super block? Couldn't it write something else instead? Or at least keep the UUID? Hmm, looking trough the code (I do that freely, before someone throws it at me :) i see that all mdadm do is a ioctl() so I guess it's something that have to be done in the kernel (?)... But how come mdadm knows that there's one removed? Common sence? There's 9 raid devices (how come it knows that?) and 8 total devices, hence one must be/have been removed? Looking at md.c in the kernel, I see that it don't write to the disk (when removing a device from an array) when persistent super blocks is used (which I don't). There don't seem to be an option in mdadm for this (I do remember using it with raidtools long ago).. Am I way of? It's early on christmas eve, and my blood to coffein ratio is high... :) -- Iran arrangements [Hello to all my fans in domestic surveillance] Kennedy pits 767 Peking Clinton assassination Cocaine PLO Ft. Meade killed $400 million in gold bullion kibo [See http://www.aclu.org/echelonwatch/index.html for more about this] - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html