On Fri, 21 Jan 2005, Greg Stark wrote:
Marco Colombo <marco@xxxxxx> writes:
Exaclty. Or, one could say: the "standard" text format is the one the platform you are running on dictates. Which is what python does.
Egads. So the set of valid Python programs is different depending on what platform you're on? That's just, uhm, insane. So essentially Python isn't a single language, it's a set of languages, Python-NL, Python-NLCR, Python-CR, (and in theory others).
No. Just any other application that reads text files, it reads text files. That simple. It's unfortunate that 'textfile' means different things on different platforms.
So if I generate a database with a Python-CRNL function on windows, then pg_dump it and load it on Unix the function won't run because it's the wrong language, Unix only supports Python-NL.
I don't think it's reasonable for pg_dump to think about converting data from one language to another. It's important for pg_dump to restore an identical database. Having it start with special case data conversation from one flavour to another seems too dangerous.
Makes no sense. pg_dump already make a lot of conversions: from internal
representation (which may be platform dependent) to some common format,
say text. It's just multi-line text which is a hard to deal with, because
there _no_ single format for it. pg_dump may just choose one format, and
stick with it. Every dump/restore will work. You may have trouble editing
a text dump, but that's another matter. BTW, what pg_dump does on windows?
I mean with -F p. Does it produce a text file with CRNL line seperator?
What happens if you feed that file to psql on a Unix box? I've tried (adding spurious CRs) on Unix, and I think SQL treats CR as
whitespace so it's no issue. But what for opposite? Is psql on Windows
able to recognize SQL scripts made on Unix? (I can't try this).
Anyway, think of floats. If you want do to FP maths fast, you need to use the native format supported by the CPU. When you dump, you get a text form of the FP number, and when you restore on a different platform you may get a _different_ number. And you have to live with it. Kiss goodbye to your "indentical database".
Incidentally, are we sure we've diagnosed this correctly? I'm discussing this with some Python developers and they're expressing skepticism. One just tried a quick test with a Python program containing a mixture of all three newline flavours and it ran fine.
Recent python has universal newline support. It works for files, and it's enabled by default when it read source files. But it's NOT part of the parser, AFAIK, and the source file gets converted to UNIX format before being fed to the parser (lexxer). Problem is that I'm not sure that's the way python is used by PostgreSQL. It works only when the program is read from a file. That's what the guy tested, probably. If you build a program, put it in a string, and invoke the parser, the string must be in Unix format.
I'm for defining a format used by PostgreSQL, and force the python parser into accepting it on all platforms. That is, let's set the rule that
python programs to be embedded into PostgreSQL use \n as line termination.
Think of this: tomorrow we meet people from Mars. One of them really likes PostgreSQL, and ports it to their platform. Being a martian platform, it uses a different text file format. Line separator there is the first 1000 binary digits of PI. When he writes a small python function on his client and tries to have it run on a server on Earth, it fails, cause the python parser here won't handle PI-terminated lines correctly. What would you do? Bug python developers because "python it's a set of languages, Python-Earth, Python-Mars, Python-Venus (and in theory others)"? (BTW, in that situation, I bet Perl would fail as well). Or would you ask the martian guy to add, as part of his port effort, support for the martian line format to PostgreSQL, so that the server can convert the python program to Earth format before feeding it to python? Or, alternatively, just tell him: python programs in PostgreSQL are \n terminated? Which one is the simplest?
.TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@xxxxxx
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org