Re: Are there plans to add data compression feature to postgresql?

Joris Dobbelsteen <joris@xxxxxxxxxxxxxxxxxxxxx> · Sun, 02 Nov 2008 20:37:45 +0100

Gregory Stark wrote, On 01-11-08 14:02:
Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> writes:

But sorry I still can't get WHY compression as a whole and data
integrity are mutually exclusive.
...
[snip performance theory]

Postgres *guarantees* that as long as everything else works correctly it
doesn't lose data. Not that it minimizes the chances of losing data. It is
interesting to discuss hardening against unforeseen circumstances as well but
it's of secondary importance to first of all guaranteeing 100% that there is
no data loss in the expected scenarios.

That means Postgres has to guarantee 100% that if the power is lost mid-write
that it can recover all the data correctly. It does this by fsyncing logs of
some changes and depending on filesystems and drives behaving in certain ways
for others -- namely that a partially completed write will leave each byte
with either the new or old value. Compressed filesystems might break that
assumption making Postgres's guarantee void.

The guarentee YOU want from the underlaying file system is that, in case 
of, lets say, a power failure:

* Already existing data is not modified.
* Overwritten data might be corrupted, but its either old or new data.
* If an fsync completes, all written data IS commited to disk

If an (file) system CAN guarantee that, in any way possible, it is safe 
to use with PostGreSQL (considering my list is complete, of course).

As a side note: I consider the second assumption a bit too strong, but 
there are probably good reasons to do so.

I don't know how these hypothetical compressed filesystems are implemented so
I can't say whether they work or not. When I first wrote the comment I was
picturing a traditional filesystem with each block stored compressed. That
can't guarantee anything like this. 

Instead the discussion reverts to discussing file systems without having 
even a glance at their method of operation. No algorithm used by the 
file system is written down, but these are being discussed.

However later in the discussion I mentioned that ZFS with an 8k block size
could actually get this right since it never overwrites existing data, it
always writes to a new location and then changes metadata pointers. I expect
ext3 with data=journal might also be ok. These both have to make performance
sacrifices to get there though.

Instead, here we are going to specifics we needed a long time ago: ZFS 
takes 8kB as their optimal point(*), and never overwrite existing data. 
So it should be as safe as any other file system, if he is indeed correct.

Now, does a different block size (of ZFS or PostGreSQL) make any 
difference to that? No, it still guarentees the list above.

Performance is a discussion better left alone, since it is really really 
dependent on your workload, installation and more specifics. It could be 
better and it can be worse.

- Joris

(*) Larger block sizes improve compression ratio. However, you pay a 
bigger penalty on writes, as more must be read, processed and written.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general