On Thu, Aug 31, 2006 at 09:17:54AM -0400, Merlin Moncure wrote: > On 8/30/06, Randall Lucas <rlucas@xxxxxxxxxxx> wrote: > > >An inverted way of thinking about the problem is the notion of getting a > >source document (say, a "company registration form") and parsing and > >storing it in a normalized format. How do you "tag" the origin of each > >and every row in every table that resulted from that source document? > > your form should have a code, either entered by the customer or by the > preparer who enters it into the database, which becomes the key that > identifies the registration document. Put that key into other tables. Yes -- I have flirted with this idea for the case of importing a set of facts from a single source. But where it breaks down is in being able to do the reverse -- ex post facto, to certify the results of a query as being accurate, and thereby attesting to the underlying facts. > be careful, you are flirting with EAV thinking. I think EAV designs > are terrible. (Though religion is not at issue here, I am pretty devoutly relational. I just want a good way to perform audits and other functions on my relations.) > > It seems to me that the elegant way to do this would be to get the > >entire graph of dependencies for not only all tables that reference the > >entity, but only those rows within those tables that refer specifically > >to my entity. > > > >The query engine should have a pretty good idea about which tables and > >which rows would actually be used in forming the responses to a given > >query. How can I get this information? Or am I off the deep end (-ency > >graph) with this one? > > I am not sure where you are going with this. Maybe you should mock up > some simple tables and repost your question. Imagine that I am the SEC. I look for cases of insider trading, and when I find a case, I have to go back and double-check all the pieces of evidence I am using against them. create table company ( id serial primary key, name text not null ); create table officer ( id serial primary key, company_id int not null references company(id) on delete cascade, name text not null, title text not null ); create table insider_trade ( id serial primary key, officer_id int not null references officer(id) on delete cascade, shares_traded numeric, share_price numeric, trade_date timestamptz ); insert into company (name) values ('goodco'); insert into company (name) values ('badco'); insert into officer (company_id, name, title) values (1, 'johnny b. good', 'ceo'); insert into officer (company_id, name, title) values (1, 'mother teresa', 'saint'); insert into officer (company_id, name, title) values (2, 'leroy brown', 'ceo'); insert into insider_trade (officer_id, shares_traded, share_price, trade_date) values (3, '50000', '6.66', '2006-07-04 1:23 PM PST'); Now, my database looks like this: select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id; id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date ----+--------+----+------------+----------------+-------+----+------------+---------------+-------------+------------------------ 1 | goodco | 1 | 1 | johnny b. good | ceo | | | | | 1 | goodco | 2 | 1 | mother teresa | saint | | | | | 2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 50000 | 6.66 | 2006-07-04 14:23:00-07 (3 rows) If I want to know whom to investigate, I might do a query like this: select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id where insider_trade.id is not null; id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date ----+-------+----+------------+-------------+-------+----+------------+---------------+-------------+------------------------ 2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 50000 | 6.66 | 2006-07-04 14:23:00-07 (1 row) Now that I have this query, in order to make my case, I need to "sign off" on all of the individual data that went into it. I would like to do something like: select last_query_shown_tuples(); schema | table_name | pk_columns | pk_values --------+---------------+------------+----------- public | company | [id] | [2] public | officer | [id] | [3] public | insider_trade | [id] | [1] (3 rows) (I am aware that, at least theoretically, my query is just as reliant upon the tuples that it *didn't* show, since they were acted upon by the where clause.) (I am also aware that this example is sub-optimal in that the primary keys are all single ints called 'id' and that they are present individually in the query result -- imagine that I used an aggregate function in a GROUP BY, for example, and you can see how the ids relied upon would become opaque, or imagine a multicolumn pk and you understand my odd notation.) Finally, I'd want to end up with something like: select last_query_shown_tuples(); schema | table_name | pk_columns | pk_values | audited_ts --------+---------------+------------+-----------+------------ public | company | [id] | [2] | 2006-08-31 18:52 public | officer | [id] | [3] | 2006-08-31 18:55 public | insider_trade | [id] | [1] | 2006-08-31 18:57 (3 rows) Is something like this even possible, much less doable at present? I have an inkling that with all of the capabilities of information_schema, rules, and triggers, it should be pretty close to doable. Best, Randall -- Randall Lucas Tercent, Inc. DF93EAD1