Likewise, when the first disk fails, one could mark it as kind of in an error state,
and keep it running, and if one gets a read error, then you could get
the data from the good disks.
Yes!! If a drive is "faulty", it means "you should replace it because it
is failing"; there is no need to actually stop using it and degrade the
whole RAID operation! What's more, it would be extremely useful at
rebuilding without any performance loss: let the array work in degraded
mode, while the faulty drive is being copied to the new one, with only
read errors reconstructed from the rest of the drives! But that's a
different issue, and not a very good idea for other reasons.
One big reason is human behaviour. And it is human behaviour that in the
end causes all the collapsed raids.
"Human behaviour", that's what I'm talking about. If the only reason to
do it is to force people to do what is necessary, that approach is
called "Windows". :) And I do not suggest that it should be the default
behaviour; instead, we should have an option "--idiotmode
--yes-i-know-what-i-am-doing" at RAID creation for those who
specifically want to take the risks.
And of course, no broken files will appear if we suffer from read
*errors*. We do not suffer from *incorrect reads*, right?..
You make it sound like it solves all problems, but it does not.
Errors are just not part of the concept anywhere really.
It does not "solve all problems", but it lets me solve my problems my
way, and not "the only correct and intended way" - which is what Linux
is good at. :)
> I believe this is the dream of everyone who had ever dealt with RAIDs.
My dream is different. I don't want errors. I want it to work. ;)
And it does, as long as you make sure your disks are healthy.
I do not suggest that we do it my way and not yours - we have an option
to do it your way, but we do not have one to do it my way, that's the
problem. :)
Anyway, if I had a collapsed RAID-5, I would want to at least have an
easy option to start it in a read-only mode in the last-known working
state, while the faulty drives are still not out of sync, and recover
data easily (to my single backup drive), or continue using the array for
a while, manually deleting one "bad" file if necessary; this is of
course not a "good thing" to do, but this way, RAID would be at least
not worse than single drives with faulty sectors, which are capable of
that, while RAIDs are not! I would be fine with that in my archive - as
I'm fine with some less importand parts of the archive being on faulty
single drives. It's just that I don't want to lose the whole drive due
to a hardware failure - and RAID adds more causes other than that,
instead of offering more protection against that.
> Using cosmetics to hide errors only works to a certain limit.
> In the end, RAID only works if the disks work. RAID 5 with
> two dead disks is dead, no way to get around that. Disks go bad
> and need to be replaced, if you don't do that, you'll just fail
> even more horribly later on.
Concur. We seem to differ on where to draw the line on "bad".
And I think that line should be easy to move, so that anyone could
choose their own! I understand that RAID is meant for "uptime, not
backups" - for enterprise production. And everything that you say is
correct about this case. However, there are other uses - like mirroring
my backup archive to protect against whole-drive failures. And in this
case, I want different behaviour; I can take in onto myself to make sure
a read error won't make my filesystems go into read-only mode and break
anything, I really know what I'm doing, and I don't need my computer to
tell me that RAID is not supposed to be used in this way. And it
shouldn't add a lot of complex code - just a test "if idiotmode and
lastdisk then return error, else kick drive; shout like crazy either
way". :)
It's just that everyone has their own opinion on where to draw the line,
and the "intended" one should of course be preached, but not forced!
--
darkpenguin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html