What I am trying to do is: building views on a base table, extended by one or more columns, extracted (hence the naming of the function "deserialize()") from a SINGLE column (XML) **that is in this same base table** (see below). Instructions for deserialization (that is: which 'fields' to look for) reside in some other table. There are MULTIPLE base tables, they basically look like this:
[table definition:]
BASETABLE(ID INT, model TEXT, field1 some_type, field2 some_type, ... fieldN some_type, serialized_data XML)
So, I wish to define multiple VIEWs based on a BASETABLE; one for each "model" (as stated in the above table definition: "model" is a property for each row). This QUERY would look like this (producing a VIEW for "MODEL1"; the query below in invalid, unfortunately):
>> SELECT base_t.*, deserialized.* FROM "BASETABLE" base_t, deserialize('MODEL1', base_t) as deserialized(fieldX some_type, fieldY some_type) WHERE base_t.model = 'MODEL1';
I have no problem with the requirement to supply the table type in the query; infact this is logical. Still, this query is impossible, obviously, because "base_t" as a target is not known in the context of the FROM-clause, where I whish to use it in calling "deserialize(...)". Ofcourse, I could write a "deserialize()" function for each base table (e.g. "deserialize_base1(...)") but I wish it to perform it's action on only rows that will actually be part of the result set; thus I want the WHERE-clause to apply to the function's seq scan álso. When provided, I whish to incorporated the user's WHERE-clause as well; this is done by the PostgreSQL RULE system...
Alternatively, the VIEW could be defined by the following query:
>> SELECT base_t.*, (deserialize('MODEL1', base_t) as temp(fieldX some_type, field_Y some_type)).* FROM "BASETABLE" base_t WHERE base_t.model = 'MODEL1';
This approach does not work either: deserialize(...) will return its set of fields as ONE field (comma separated, circumfixed by brackets); expressions within a SELECT-list seem to be only allowed to result in ONE column, except from the * shorthand...
** So, the question is: how can i feed my "deserialize()" function with a record subset (e.g. filter by a WHERE-clause) of the BASETABLE, while still returning a record?!? **
I tried the following approach also:
>> SELECT (SELECT fieldX FROM deserialize(base_t) deserialized(fieldX some_type, fieldY some_type)) "fieldX", (SELECT fieldY FROM deserialize(base_t) deserialized(fieldX some_type, fieldY some_type)) "fieldY" FROM "BASETABLE" table_t WHERE model= 'MODEL1';
Which infact worked, but caused the function to get invoked TWICE FOR EACH ROW (even when marked IMMUTABLE STRICT; or did I have to flush cached query plans in psql?).
Another approach would be to put all key/value pairs into a separate table (as one would do when implementing a EAV-model within a RDBMS) which is then to be joined (and joined again... and possibly again (!); in case of MULTIPLE additional rows -- depending on the definition of the VIEW) onto the BASETABLE, rather than to deserialize from XML which is stored within the same record... How does this approach then actually translate in terms of table scans? Will they be limited by the filter on the BASETABLE, as the available values to join on will be limited? At any rate: this approach will be more difficult to implement / maintain in case of EDITABLE VIEWS (inserts, update, delete)...
Hope any of you has some useful thoughts on this... It appears to me updating the additional (virtual) fields in the BASETABLE is much easier: the "serialize()"-function can be fed by a list of key/value pairs, producing some XML that can be stored in the xml field of "serialized_data", part of this same base table...
All this needs to be implemented fully in the database back-end; client application will not know they are talking to VIEWS rather than tables... Thus: the hosted database must simulate to provide various tables, whereas these are in fact stored in a limited number of base tables.
Thanks in advance, you guys out there!
Rob
2009/9/26 Merlin Moncure <mmoncure@xxxxxxxxx>
On Fri, Sep 25, 2009 at 2:40 PM, Rob Marjot <rob@xxxxxxxxxxxxxxxxxxxx> wrote:In a sense, what you are asking is impossible. Having a function (even
> Still no luck... To clarify a bit, take this example:
> CREATE OR REPLACE FUNCTION transpose()
> RETURNS record AS
> $BODY$ DECLARE
> output RECORD;
> BEGIN
> SELECT * INTO output FROM (VALUES(1,2)) as tbl(first,second);
> RETURN output;
> END;$BODY$
> LANGUAGE 'plpgsql' STABLE
> COST 100;
> Now, I expect to have 2 columns; named "first" and "second". However, like
> posted before, the flowing query:
> SELECT * FROM deserialize();
> produces only ONE column (in one row, as one would clearly expect from the
> function's defnition):
> deserialize
> -----------
> (1,2)
> (1 row)
>
> Any thoughts on how to make sure multiple columns are returned; without
> specifying this in the function's prototype return clause?
a C one) return 'record' does not get you out of having to define the
output columns...either in the function definition with 'out' or as
part of the calling query. Depending on what you are trying to do,
this could either matter a little or a lot. If it matters, why don't
you post some more details of the context of your problem and see if a
better solution can be found?
merlin