Re: Multiline plpython procedure

Martijn van Oosterhout <kleptog@xxxxxxxxx> · Fri, 21 Jan 2005 13:42:42 +0100

On Fri, Jan 21, 2005 at 12:02:09PM +0100, Marco Colombo wrote:
> On Fri, 21 Jan 2005, Greg Stark wrote:
> >I don't think it's reasonable for pg_dump to think about converting
> >data from one language to another. It's important for pg_dump to
> >restore an identical database. Having it start with special case
> >data conversation from one flavour to another seems too dangerous.
> 
> Makes no sense. pg_dump already make a lot of conversions: from internal
> representation (which may be platform dependent) to some common format,
> say text. It's just multi-line text which is a hard to deal with, because
> there _no_ single format for it. pg_dump may just choose one format, and
> stick with it. Every dump/restore will work. You may have trouble editing
> a text dump, but that's another matter. BTW, what pg_dump does on windows?
> I mean with -F p. Does it produce a text file with CRNL line seperator?
> What happens if you feed that file to psql on a Unix box? 

Ah, but you see, looking at it from your point of view, pg_dump doesn't
interpret text strings. For example, the python script in a function is
a opaque string. Not multiline, nothing. All postgresql does is pass
that block of opaque data to the interpreter for that language. pg_dump
dumps that opaque data into the output, and the CREATE FUNCTION dumps
that opaque data back into the system tables. Postgresql doesn't
understand python any more or less than perl, tcl, R or any other
language.

The argument here is that basically this opaque data has different
meanings for Python on windows and Python on unix. You can't make any
special cases because I can rename plperl.so to plpython.so (or
vice-versa) the opaque data won't be passed to the interpreter that
you'd expect from looking at the definition.

> I'm for defining a format used by PostgreSQL, and force the python parser 
> into accepting it on all platforms. That is, let's set the rule that
> python programs to be embedded into PostgreSQL use \n as line termination.

Wouldn't that disadvantage non-unix pl/python users, whose python
functions would have to be converted at run-time to conform to the
local text format. With the extra bummer that the resulting string may
not be the same size either. Remember, postgresql uses the standard
shared library for the language on the platform, it doesn't build its
own.

But sure, preprocessing the source at run-time seems to be the only
realistic solution without a change to the interpreter.

> Think of this: tomorrow we meet people from Mars. One of them really likes
> PostgreSQL, and ports it to their platform. Being a martian platform, it
> uses a different text file format. Line separator there is the first 1000

<snip>

Spurious argument. You're assuming Martians would use ASCII to write
programs without using one of the two defined line-ending characters.
If they were smart they'd simply use a character set which doesn't have
the ambiguity. If they even use 8-bit bytes. An ASCII C compiler won't
compile EBCDIC source code either, but nobody thinks that's
unreasonable, probably because nobody uses EBCDIC anymore :).

No-one is complaining about the use of line-ending characters, they
could have said that you need a semi-colon to seperate "lines". The
problem is that it's *not consistant* across platforms.

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog@xxxxxxxxx>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
Attachment:
pgpTqKAcBxBpn.pgp

Description: PGP signature