Re: Converting MySQL tinyint to PostgreSQL

"Jim C. Nasby" <decibel@xxxxxxxxxxx> · Sun, 17 Jul 2005 17:47:37 -0500

On Sat, Jul 16, 2005 at 03:18:24PM -0700, Ron Mayer wrote:
> Jim C. Nasby wrote:
> >On Thu, Jul 14, 2005 at 11:29:23PM +0200, Martijn van Oosterhout wrote:
> >>On Thu, Jul 14, 2005 at 11:30:36AM -0500, Jim C. Nasby wrote:
> >>>On Wed, Jul 13, 2005 at 07:52:04PM -0400, Bruce Momjian wrote:
> >>>
> >>>>This is a good point.  We have always stored data on disk that exactly
> >>>>matches its layout in memory.  We could change that, but no one has
> >>>>shown it would be a win.
> >>>
> >>[...]
> >
> >My thought was to convert as pages were read and written. That should
> >minimize the code impact.
> 
> If that were practical, even more radical I/O saving tricks might be
> possible beyond removing alignment bytes - like some compression algorithm.

True, though there's a few issues with zlib compression. First, you have
to be able to pull specific pages out of the files on disk. Right now
that's trivial; you just read bytes xxx - yyy. With compression things
are more difficult, because you no longer have a fixed page size.

Another issue is that with a variable disk page size, you have to deal
with what happens when you try to put a page back on disk but the page
is now larger than it's original size.

These issues are why I suggested a fixed disk page size and a variable
in-memory page size; it simplifies things a bit. It does however create
some problems of it's own. When you go to transform/compress a page to
put it on disk if the in-memory page is now too large you'll need to
move some tuples to another page.

Something else to consider is that a simple compression scheme such as
eliminating alignment padding makes it easy to determine how large a
tuple will be on disk versus in memory. This means you can do things
like determine at the time of tuple creation if that tuple will fit in
an existing page or not. I don't know if the same can be said for other
methods. Another factor is that more complex compression methods will be
much more CPU intensive.

FWIW, the way oracle handles compression is as a one-time operation.
When you tell it to compress a table it will re-write the entire table,
compressing it as it goes. But any pages that get changed after that
will end up uncompressed. Of course in a data warehouse environment
that's perfectly acceptable.

Ultimately I don't see anything being done along these lines unless
someone can come up with some data indicating performance gains, which
will probably mean hacking some amount of this in and benchmarking it.
Although for the case of simple elimination of alignment padding you
could probably come up with some pretty good estimates just by looking
at a table's layout and it's statistics.
-- 
Jim C. Nasby, Database Consultant               decibel@xxxxxxxxxxx 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings