Re: Are there plans to add data compression feature to postgresql?

Sam Mason <sam@xxxxxxxxxxxxx> · Mon, 3 Nov 2008 02:19:14 +0000

On Mon, Nov 03, 2008 at 10:01:31AM +0900, Craig Ringer wrote:
> Sam Mason wrote:
> >Your lzop numbers look *very* low; the paper suggests
> >compression going up to ~0.3GB/s on a 2GHz Opteron.
> 
> Er ... ENOCOFFEE? . s/Mb(it)?/MB/g . And I'm normally *so* careful about 
> Mb/MB etc; this was just a complete thinko at some level. My apologies, 
> and thanks for catching that stupid error.

Nice to know we're all human here :)

> The paragraph should've read:
> 
> I get 19 MB/s (152 Mb/s) from gzip (deflate) on my 2.4GHz Core 2 Duo 
> laptop. With lzop (LZO) the machine achieves 45 MB/s (360 Mb/s). In both 
> cases only a single core is used. With 7zip (LZMA) it only manages 3.1 
> MB/s (24.8 Mb/s) using BOTH cores together.

Hum, I've just had a look and found that Debian has a version of a lzop
compression program.  I uncompressed a copy of the Postgres source for
a test and I'm getting around 120MBs when compressing on a 2.1GHz Core2
processor (72MB in 0.60 seconds, fast mode).  If I save the output
and recompress it I get about 40MB/s (22MB in 0.67 seconds), so the
compression rate seems to be very dependent on the type of data.  As
a test, I've just written some code that writes out (what I guess the
"LINENUMBER" test is in the X100 paper) a file consisting of small
integers (less than 2 decimal digits, i.e. lots of zero bytes) and
now get up to 0.4GB/s (200MB in 0.5 seconds), which nicely matches my
eyeballing of the figure in the paper.

It does point out that compression rates seem to be very data dependent!

> So - it's potentially even worth compressing the wire protocol for use 
> on a 100 megabit LAN if a lightweight scheme like LZO can be used.

The problem is that then you're then dedicating most of a processor to
doing the compression, one that would otherwise be engaged in doing
useful work for other clients.

BTW, the X100 work was about trying to become less IO bound; they had
a 350MB/s RAID array and were highly IO bound.  If I'm reading the
paper right, with their PFOR algorithm they got the final query (i.e.
decompressing and doing useful work) running at 500MB/s.

  Sam

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general