tjo@xxxxxxx ("TJ O'Donnell") writes: > I am getting in the habit of storing much of my day-to-day > information in postgres, rather than "flat" files. > I have not had any problems of data corruption or loss, > but others have warned me against abandoning files. > I like the benefits of enforced data types, powerful searching, > data integrity, etc. > But I worry a bit about the "safety" of my data, residing > in a big scary database, instead of a simple friendly > folder-based files system. > > I ran across this quote on Wikipedia at > http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29 > > "Text files are also much safer than databases, in that should disk > corruption occur, most of the mail is likely to be unaffected, and any > that is damaged can usually be recovered." > > How naive (optimistic?) is it to think that "the database" can > replace "the filesystem"? There is certainly some legitimacy to the claim; the demerits of things like the Windows Registry as compared to "plain text configuration" have been pretty clear. If the "monstrous fragile binary data structure" gets stomped on, by any means, then you can lose data in pretty massive and invisible ways. It's most pointedly true if the data representation conflates data and indexes in some attempt to "simplify" things by having Just One File. In such a case, if *any* block gets corrupted, that has the potential to irretrievably destroy the database. However, the argument may also be taken too far. -> A PostgreSQL database does NOT assemble data into "one monstrous fragile binary data structure." Each table consists of data files that are separate from index files. Blowing up an index file *doesn't* blow up the data. -> You are taking regular backups, right??? If you are, that's a considerable mitigation of risks. I don't believe it's typical to set up off-site backups of one's Windows Registry, in contrast... -> In the case of PostgreSQL, mail stored in tuples is likely to get TOASTed, which changes the shape of things further; the files get smaller (due to compression), which changes the "target profile" for this data. -> In the contrary direction, storing the data as a set of files, each of which requires storing metadata in binary filesystem data structures provides an (invisible-to-the-user) interface to what is, no more or less, than a "monstrous fragile binary data structure." That is, after all, what a filesystem is, if you strip out the visible APIs that turn it into open()/close()/mkdir() calls. If the wrong directory block gets "crunched," then /etc could get munched just like the Windows Registry could. Much of the work going into filesystem efforts, the last dozen years, is *exceeding* similar to the work going into managing storage in DBMSes. People working in both areas borrow from each other. The natural result is that they live in fairly transparent homes in relation to one another. Someone who "casts stones" of the sort in your quote is making the fallacious assumption that since the fact that a filesystem is a database of file information is kept fairly much invisible, that a filesystem is somehow fundamentally less vulnerable to the same kinds of corruptions. Reality is that they are vulnerable in similar ways. The one thing I could point to, in Eudora, as a *further* visible merit that DOES retain validity is that there is not terribly much metadata entrusted to the filesystem. Much the same is true for the Rand MH "Mail Handler", where each message is a file with very little filesystem-based metadata. If you should have a filesystem failure, and discover you have a zillion no-longer-named in lost+found, and decline to recover from a backup, it should nonetheless be possible to re-process them through any mail filters, and rebuild a mail filesystem that will appear roughly similar to what it was like before. That actually implies that there is *more* "conservatism of format" than first meets the eye; in effect, the data is left in raw form, replete with redundancies that can, in order to retain the ability to perform this recovery process, *never* be taken out. There is, in effect, more than meets the eye here... -- (format nil "~S@~S" "cbbrowne" "acm.org") http://linuxfinances.info/info/advocacy.html "Lumping configuration data, security data, kernel tuning parameters, etc. into one monstrous fragile binary data structure is really dumb." - David F. Skoll ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/