On Thu, Oct 30, 2008 at 03:50:20PM +1100, Grant Allen wrote: > One other thing I forgot to mention: Compression by the DB trumps > filesystem compression in one very important area - shared_buffers! (or > buffer_cache, bufferpool or whatever your favourite DB calls its working > memory for caching data). Because the data stays compressed in the > block/page when cached by the database in one of its buffers, you get > more bang for you memory buck in many circumstances! Just another angle > to contemplate :-) The database research project known as MonetDB/X100 has been looking at this recently; the first paper below gives a bit of an introduction into the design of the database and the second into the effects of different compression schemes: http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuBoNeHe:DEBULL:05 http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuHeNeBo:ICDE:06 The important thing seems to be is that you don't want a storage efficient compression scheme, decent RAID subsystems demand a very lightweight scheme that can be decompressed at several GB/s (i.e. two or three cycles per tuple, not 50 to 100 like traditional schemes like zlib or bzip). It's very interesting reading (references to "commercial DBMS `X'" being somewhat comical), but it's a *long* way from being directly useful to Postgres. It's interesting to bear in mind some of the things they talk about when writing new code, the importance of designing cache conscious algorithms (and then when writing the code) seem to have stuck in my mind the most. Am I just old fashioned, or is this focus on cache conscious design quite a new thing and somewhat undervalued in the rest of the software world? Sam p.s. if you're interested, there are more papers about MonetDB here: http://monetdb.cwi.nl/projects/monetdb/Development/Research/Articles/index.html -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general