Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Neil Brown <neilb@xxxxxxxxxxxxxxx> wrote:
> On Tuesday January 4, ptb@xxxxxxxxxxxxxx wrote:
> > Bits flip on our client disks all the time :(.  
> 
> You seem to be alone in reporting this.  I certainly have never
> experienced anything quite like what you seem to be reporting.

I don't feel the need to prove it to you via actual evidence.  You
already know of mechanisms which produce such an effect:

> Certainly there are reports of flipped bits in memory. 

 .. and that is all the same to your code when it comes to resyncing.
 You don't care whether the change is real or produced in the cpu, on the
bus, or wherever. It still is what you will observe and copy.

> If you have
> non-ecc memory, then this is a real risk and when it happens you
> replace the memory.


Sure.

> Usually it happens with a sufficiently high
> frequency that the computer is effectively unusable.

Well, there are many computers that remain usable. When I see bit flips
the first thing I request the techs to do is check the memory and keep
on checking it until they find a fault. I also ask them to check the
fans, clean out dust and so on.

In a relatively small percentage of cases, it turns out that the changes
are real, on the disk, and persist from reboot to reboot, and move with
the disk when one moves it from place to place.  I don't know where
these come from - perhaps from the drive electronics, perhaps from the
disk. 

> But bits being flipped on disk, without the drive reporting an error,
> and without the filesystem very quickly becoming unusable, is (except
> for your report) unheard of.

As far as I recall it is usually a bit flipped througout a range of
consecutive addresses on disk, when it happens. I haven't been
monitoring this daily check for about a year now, however, so I don't
have any data to show you.

> md/raid would definitely not help that sort of situation at all.

Nor is there any reason to suggest it should - it just doesn't check.
It could.



Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux