On Mon, Oct 24, 2011 at 11:37 PM, David Boreham <david_list@xxxxxxxxxxx> wrote: >> What about redundancy? >> >> How do you swap an about-to-die SSD? >> >> Software RAID-1? > > The approach we take is that we use 710 series devices which have predicted > reliability similar to all the other components in the machine, therefore > the unit of replacement is the entire machine. We don't use trays for > example (which saves quite a bit on data center space). If I were running > short endurance devices such as 320 series I would be interested in > replacing the drives before the machine itself is likely to fail, but I'd do > so by migrating the data and load to another machine for the replacement to > be done offline. Note that there are other operations procedures that need > to be done and can not be done without downtime (e.g. OS upgrade), so some > kind of plan to deliver service while a single machine is down for a while > will be needed regardless of the storage device situation. Interesting. But what about unexpected failures. Faulty electronics, stuff like that? I really don't think a production server can work without at least raid-1. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance