Re: Why not just return an error?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Likewise, when the first disk fails, one could mark it as kind of in an error state,
and keep it running, and if one gets a read error, then you could get
the data from the good disks.

Yes!! If a drive is "faulty", it means "you should replace it because it is failing"; there is no need to actually stop using it and degrade the whole RAID operation! What's more, it would be extremely useful at rebuilding without any performance loss: let the array work in degraded mode, while the faulty drive is being copied to the new one, with only read errors reconstructed from the rest of the drives! But that's a different issue, and not a very good idea for other reasons.


One big reason is human behaviour. And it is human behaviour that in the
end causes all the collapsed raids.

"Human behaviour", that's what I'm talking about. If the only reason to do it is to force people to do what is necessary, that approach is called "Windows". :) And I do not suggest that it should be the default behaviour; instead, we should have an option "--idiotmode --yes-i-know-what-i-am-doing" at RAID creation for those who specifically want to take the risks.

And of course, no broken files will appear if we suffer from read *errors*. We do not suffer from *incorrect reads*, right?..


You make it sound like it solves all problems, but it does not.
Errors are just not part of the concept anywhere really.

It does not "solve all problems", but it lets me solve my problems my way, and not "the only correct and intended way" - which is what Linux is good at. :)


> I believe this is the dream of everyone who had ever dealt with RAIDs.

My dream is different. I don't want errors. I want it to work. ;)
And it does, as long as you make sure your disks are healthy.

I do not suggest that we do it my way and not yours - we have an option to do it your way, but we do not have one to do it my way, that's the problem. :)

Anyway, if I had a collapsed RAID-5, I would want to at least have an easy option to start it in a read-only mode in the last-known working state, while the faulty drives are still not out of sync, and recover data easily (to my single backup drive), or continue using the array for a while, manually deleting one "bad" file if necessary; this is of course not a "good thing" to do, but this way, RAID would be at least not worse than single drives with faulty sectors, which are capable of that, while RAIDs are not! I would be fine with that in my archive - as I'm fine with some less importand parts of the archive being on faulty single drives. It's just that I don't want to lose the whole drive due to a hardware failure - and RAID adds more causes other than that, instead of offering more protection against that.


> Using cosmetics to hide errors only works to a certain limit.
> In the end, RAID only works if the disks work. RAID 5 with
> two dead disks is dead, no way to get around that. Disks go bad
> and need to be replaced, if you don't do that, you'll just fail
> even more horribly later on.

Concur.  We seem to differ on where to draw the line on "bad".

And I think that line should be easy to move, so that anyone could choose their own! I understand that RAID is meant for "uptime, not backups" - for enterprise production. And everything that you say is correct about this case. However, there are other uses - like mirroring my backup archive to protect against whole-drive failures. And in this case, I want different behaviour; I can take in onto myself to make sure a read error won't make my filesystems go into read-only mode and break anything, I really know what I'm doing, and I don't need my computer to tell me that RAID is not supposed to be used in this way. And it shouldn't add a lot of complex code - just a test "if idiotmode and lastdisk then return error, else kick drive; shout like crazy either way". :)

It's just that everyone has their own opinion on where to draw the line, and the "intended" one should of course be preached, but not forced!

--
darkpenguin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux