On Wed, 2005-02-16 at 07:14, Alban Hertroys wrote: > Marco Colombo wrote: > > On Wed, 16 Feb 2005, Andrew Hall wrote: > > > >> fsync is on for all these boxes. Our customers run their own hardware > >> with many different specification of hardware in use. Many of our > >> customers don't have UPS, although their power is probably pretty > >> reliable (normal city based utilities), but of course I can't > >> guarantee they don't get an outage once in a while with a thunderstorm > >> etc. > > > > > > I see. Well I can't help much, then, I don't run PG on XFS. I suggest > > testing > > on a different FS, to exclude XFS problems. But with fsync on, the FS has > > very little to do with reliability, unless it _lies_ about fsync(). Any > > FS should return from fsync only after data is on disc, journal or not > > (there might be issues with meta-data, but it's hardly a problem with PG). > > > > It's more likely the hardware (IDE disks) lies about data being on plate. > > But again that's only in case of sudden poweroffs. > > Do you happen to have the same type disks in all these systems? That > could point to a disk cache "problem" (f.e. the disks lying about having > written data from the cache to disk). > > Or do you use the same disk parameters on all these machines? Have you > tried using the disks w/o write caching and/or in synchronous mode > (contrary to "async"). I was wondering if this problem had ever shown up on a machine that HADN'T lost power abrubtly or not. IFF the only machines that experience corruption have lost power beforehand sometime, then I would look towards either the drives, controller or file system or somewhere in there. I know there are write modes in ext3 that will allow corruption on power loss (I think it's writeback). I know little of XFS in a production environment, as I run ext3, warts and all. ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org