On Fri, 2 Dec 2005, Luke Lonergan wrote:
Stephen,
On 12/2/05 1:19 PM, "Stephen Frost" <sfrost@xxxxxxxxxxx> wrote:
I've used the binary mode stuff before, sure, Postgres may have to
convert some things but I have a hard time believing it'd be more
expensive to do a network_encoding -> host_encoding (or toasting, or
whatever) than to do the ascii -> binary change.
From a performance standpoint no argument, although you're betting that you
can do parsing / conversion faster than the COPY core in the backend can (I
know *we* can :-). It's a matter of safety and generality - in general you
can't be sure that client machines / OS'es will render the same conversions
that the backend does in all cases IMO.
One more thing - this is really about the lack of a cross-platform binary
input standard for Postgres IMO. If there were such a thing, it *would* be
safe to do this. The current Binary spec is not cross-platform AFAICS, it
embeds native representations of the DATUMs, and does not specify a
universal binary representation of same.
For instance - when representing a float, is it an IEEE 32-bit floating
point number in little endian byte ordering? Or is it IEEE 64-bit? With
libpq, we could do something like an XDR implementation, but the machinery
isn't there AFAICS.
This makes sense, however it then raises the question of how much effort
it would take to define such a standard and implement the shim layer
needed to accept the connections vs how much of a speed up it would result
in (the gain could probaly be approximated with just a little hacking to
use the existing binary format between two machines of the same type)
as for the standards, standard network byte order is big endian, so that
should be the standard used (in spite of the quantity of x86 machines out
there). for the size of the data elements, useing the largest size of each
will probably still be a win in size compared to ASCII. converting between
binary formats is useally a matter of a few and and shift opcodes (and
with the core so much faster then it's memory you can afford to do quite a
few of these on each chunk of data without it being measurable in your
overall time)
an alturnative would be to add a 1-byte data type before each data element
to specify it's type, but then the server side code would have to be
smarter to deal with the additional possibilities.
David Lang