Suspected corruption on ACID databases due to no barrier support in ext3 on software raid-5 and hard resets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

we are quite sure we are hitting data corruption on a few % of cases
on ACID* databases due to write caching enabled on drives in a
software RAID-5 configuration with ext3 in default data=ordered mode.

The machines are hard reset by a hardware watchdog when some esoteric
PCI device misbehaves.

We understand Linux software raid 5 does not pass-down barriers, is
that correct, and is this being implemented?

Also, our near-term direction of solution would be
0) disable write caches altogether, probably not feasible due to the
performance regression involved.
1) solve the misbehave (cause of reset).
2) use a shorter timed software watchdog to trigger the drives into
disabling their write caches, so that an imminent reboot has its
commits ordered.

Any other ideas?

Regards,
-- 
Leon

*http://en.wikipedia.org/wiki/ACID
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux