This is what you said: "A more consistent and risk-free way of doing it would probably be to do the above partial resync in a background thread or so?.." This sounded like Neil's current plan. But if I understand the plan, the drive would be kicked out of the array. A log would track which stripes were effected by the bad block and other writes to the array. A partial re-sync would be done and the disk put back into the array. I think the array would be degraded during the re-sync. This is why I made my comments. On a related subject, I don't like Neil's plan if it causes the array to be degraded, since while degraded, if another disk has a bad block the array then goes off-line. Not a good plan IMO. I also never said a full re-sync, I was referring to correcting the bad block(s). And 1000 bad blocks! I have never had 2 on the same disk at the same time. AFAIK. I would agree that 1000 would put a strain on the system! Normally only 1 block is bad at a time IMO. However, I guess it is possible to have a full track go bad. Sometime in the past I have said there should be a threshold on the number of bad blocks allowed. Once the threshold is reached, the disk should be assumed bad, or at least failing, and should be replaced. I would also say the bad block repair should have a speed limit so the effect on the overall system is minimized. Maybe only allow it to repair 1 bad block per minute. (a configurable parameter). But your 1000 bad block example would take almost 17 hours. I think 1000 bad blocks at one time is an indication you have a head failure. In that case, the disk is bad. Does anyone know how many spare blocks are on a disk? My worse disk has 28 relocated bad blocks. Guy -----Original Message----- From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Molle Bestefich Sent: Wednesday, March 02, 2005 7:05 AM To: Guy Cc: linux-raid@xxxxxxxxxxxxxxx Subject: Re: Joys of spare disks! Hm.. I said partial resync, because a full resync would be a waste of time if it's just a thousand sectors or so that needs to be relocated. Anyhow. There's no overhead to the application with the (theoretically "partial") degraded mode, since it happens in parallel. The latency of doing it while the read operation is ongoing would be, say, 3 seconds or so per bad sector on a standard disk? Imagine a thousand bad sectors, and any sane person would quickly pull the plug from the dead box and have it resync when it boots instead of staring at a hung system. When that happens there's even the risk that the resync fails completely, if md decides to pull one of the disks other than the one with bad blocks on it from the array before it resyncs. I prefer the first scenario (the system keeps running, the array isn't potentially destroyed), even if it means a slightly lower I/O rate and thus a minor overhead if and only if running applications utilize the I/O subsystem 100%.. Am I wrong? Guy wrote: > I think the overhead related to fixing the bad blocks would be insignificant > compared to the overhead of degraded mode. > > Guy > > -----Original Message----- > From: linux-raid-owner@xxxxxxxxxxxxxxx > [mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Molle Bestefich > Sent: Tuesday, March 01, 2005 10:51 PM > To: linux-raid@xxxxxxxxxxxxxxx > Subject: Re: Joys of spare disks! > > Robin Bowes wrote: > > I envisage something like: > > > > md attempts read > > one disk/partition fails with a bad block > > md re-calculates correct data from other disks > > md writes correct data to "bad" disk > > - disk will re-locate the bad block > > Probably not that simple, since some times multiple blocks will go > bad, and you wouldn't want the entire system to come to a screeching > halt whenever that happens. > > A more consistent and risk-free way of doing it would probably be to > do the above partial resync in a background thread or so?.. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html