John McKown wrote > I have a table which has some "raw" data in it. By "raw", I mean it is > minimally processed from a log file. Every week, I update this table by > processing the weekly log using awk to create a "psql script" file which > looks similar to: > > COPY rawdata FROM STDIN; > .... lines created by awk script > \. > > The table schema is: > Table "jobrun.rawdata" > Column | Type | > ----------+---------------+ > lpar | character(4) | > yyddd | character(5) | > timedata | character(11) | > jobid | character(8) | > msgid | character(7) | > jobname | character(8) | > > Now, this data is not really very useful in its raw form. So I "process" > it > via a view: > > View "jobrun.rundata" > Column | Type | > ---------+--------------------------+ > lpar | character(4) | > msgid | character(7) | > jobname | character(8) | > jobid | character(8) | > msgtime | timestamp with time zone | > View definition: > SELECT rawdata.lpar, > rawdata.msgid, > rawdata.jobname, > rawdata.jobid, > to_timestamp((rawdata.yyddd::text || ' '::text) || > rawdata.timedata::text, 'YYDDD HH24:MI:SS.MS'::text) AS msgtime > FROM rawdata; > > My question is this: If I do a number of SELECTs on the "rundata" table. > So, would it be worth while to make this a table in itself? The plus of a > view is that I don't need to worry about updates. And I still have the > "raw" data around. In reality, this is just the first VIEW. I create three > other views. Two views are to "subset" the data based on the contents of > the "msgid" value (there are only 2 possibilities at present: 'START' and > 'END'). The final view, which is my actual information is a FULL OUTER > JOIN > of the START and END subset, based on lpar,jobname, and jobid: > > View "jobrun.runinfo" > Column | Type | > ----------+--------------------------+ > lpar | character(4) | > jobname | character(8) | > jobid | character(8) | > runstart | timestamp with time zone | > runend | timestamp with time zone | > View definition: > SELECT COALESCE(a.lpar, b.lpar) AS lpar, > COALESCE(a.jobname, b.jobname) AS jobname, > COALESCE(a.jobid, b.jobid) AS jobid, > a.msgtime AS runstart, > b.msgtime AS runend > FROM runstart a > FULL JOIN runend b ON a.lpar = b.lpar AND a.jobname = b.jobname AND > a.jobid = b.jobid; > > So the overhead may be quite high, because to SELECT from RUNINFO, > PostgreSQL must realize all four views. > > I appreciate your thoughts on if this is OK, given that performance is > currently acceptable. Mainly because this work is basically only done one > a > week, on Sundays. And I don't do it myself, it is done via a scheduler > (not > cron, but similar) which runs some scripts. I would likely make "jobrun.runinfo" into a table while leaving "jobrun.rawdata" as-is. I would have a function that populates "runinfo" from "rawdata" that I would call after performing the copy to "rawdata". There would be no views - unless you desire a view interface over "runinfo" for API or permission reasons. In 9.4 you can (probably) make "runinfo" an explicit MATERIALIZED VIEW and perform REFRESH command to accomplish the same thing - though I am not particularly familiar with the mechanics of that feature. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Design-table-vs-view-tp5811577p5811589.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.