Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alvin Oga wrote:
On Wed, 5 Jan 2005, Guy wrote:

I agree, but for a different reason. Your reason is new to me.
...
Loosing the swap disk would kill the system.

if one is using swap space ... i'd add more memory .. before i'd use raid - swap is too slow and as you folks point out, it could die due to (unlikely) bad disk sectors in swap area

It isn't always practical. You add as much memory as needed for your "typical workload". But there may be "spikes" of load with that you have to deal somehow. Adding more memory to cover that "spikes" may be too expensive.

Also, if your "typical workload" requires eg 2Gb memory, adding
another, say, 2Gb to cover "spikes" means you have to reconfigure
the kernel to support large amount of memory, which also costs
something in terms of speed on i386 architecture.

Disks are *much* cheaper than ram in terms of money/Mb.

I don't want a down system due to a single disk failure.

that's what raid's for :-)

I mirror everything, or RAID5. Normally, no downtime due to disk failures.

the problem with mirror ( raid1 ).. or raid5 ... - if you have a bad diska ... all "bad data" will/could also get copied to the good disk

Again: pretty PLEASE, stop talking about thouse mysterious "silent corruption/errors". Errors gets detected. It is *very* unlikely case when an error on disk (either unability to read, or reading the "wrong" (aka not the same as has been written) data) will not be detected during read, and if you do care about that cases, you have to use some very different hardware with every component (CPU, memory, buses, controllers etc etc) at least tripled, with hardware-level online monitoring/comparing stuff to detect errors at any level and to switch to another component if one is "lying".

- "bad data" is hard to figure out in code ... to prevent it from
getting copied ... how does it know with 100% certainty

Nothing is 100% certain.. maybe except that we all will die sometime...

	- if you know why it's bad data,  it's lot easier to know which
	data is more correct than the bad one

Nothing is "more correct". If the disk isn't working somehow, we know this (as it reports errors) and kick it from the array. If disk "does not work silently", see above.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux