On Wed, 2007-05-30 at 22:28 -0400, Mike Accetta wrote: > Alberto Alonso writes: > > OK, lets see if I can understand how a disk gets flagged > > as bad and removed from an array. I was under the impression > > that any read or write operation failure flags the drive as > > bad and it gets removed automatically from the array. > > > > However, as I indicated in a prior post I am having problems > > where the array is never degraded. Does an error of type: > > end_request: I/O error, dev sdb, sector .... > > not count as a read/write error? > > I was also under the impression that any read or write error would > fail the drive out of the array but some recent experiments with error > injecting seem to indicate otherwise at least for raid1. My working > hypothesis is that only write errors fail the drive. Read errors appear > to just redirect the sector to a different mirror. > > I actually ran across what looks like a bug in the raid1 > recovery/check/repair read error logic that I posted about > last week but which hasn't generated any response yet (cf. > http://article.gmane.org/gmane.linux.raid/15354). This bug results in > sending a zero length write request down to the underlying device driver. > A consequence of issuing a zero length write is that it fails at the > device level, which raid1 sees as a write failure, which then fails the > array. The fix I proposed actually has the effect of *not* failing the > array in this case since the spurious failing write is never generated. > I'm not sure what is actually supposed to happen in this case. Hopefully, > someone more knowledgeable will comment soon. > -- > Mike Accetta I was starting to think that nobody got my posts, I know there are plenty of people that understand raid and didn't get any answers to any of my related posts. After thinking about your post, I guess I can see some logic behind not failing on the read, although I would say that after x amount of read failures a drive should be kicked out no matter what. In my case I believe the errors are during writes, which is still confusing. Unfortunately I've never done any kind of disk I/O code so I am afraid of looking at the code and getting completely lost. Alberto - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html