On Wed, Sep 21, 2011 at 11:34 PM, Craig Ringer <ringerc@xxxxxxxxxxxxx> wrote: [snip] > This can get complicated when you have triggers acting recursively on a > table and it isn't always that easy to understand exactly what a trigger > will see. I do agree with most all your points. The value I got out of this experience was the perspective I gained dealing with materialized views and being forced to think about the logic behind their implementation. As you said, this feature is friggen tough to do right. I read up a bit about Microsoft SQL’s “indexed views” and they too have a long list of things you can’t do, and from what I can tell, they only support the idea of keeping them up to date on every commit. So, since we’re not on this list to discuss how we can improve Oracle, I’d like to not make that the focus of my response. I’d rather have a discussion on what materialized views could mean in the Postgres world. I think my expectations, now slightly more validated through my realization of what’s possible with Oracle, would go something like this: 1) Though I might have given the impression that a “manual complete refresh” is not useful, I definitely see value in this especially for data warehousing scenarios. However, I’d almost want to call this a “snapshot” and not a “materialized view” – the two things are so different, I think it warrants different syntax. I think the ability to create a snapshot in time would be quite useful: CREATE SNAPSHOT Foo AS SELECT * FROM Bar; Now, I have “Foo” as a record in time, and can refer to it as I could any other table. I’d also like the ability to refresh it (via a schedule or a trigger): REFRESH SNAPSHOT Foo; Snapshots would remember their underlying query and could thus easily update at any time. 2) The huge feature here (which MS SQL implements as indexed views) are views that automatically update as the data underneath them changes. I’ve come to the conclusion that, while it’s impressive that Oracle can do anything close to this and have it actually work pretty well in a lot of scenarios, Oracle simply can’t have an all-encompassing knowledge of exactly how my database works and where all the data comes from. You know who does have that knowledge? Me. I believe I should have the power to instruct Postgres exactly when and how to update my view in situations where it cannot be automatically ascertained through the engine, rather than the DB saying “Sorry I can’t be perfect thus you can’t do that.” For me, I see this “chore” as a lot more appealing than trying to figure out why I can’t make the view that I want. I expect to be able to create two kinds of materialized views: Ones that COMPLETE refresh any time any referred column changes, and ones that do a FAST refresh. If I specify FAST but the engine can’t infer what it needs, I should get a NOTICE and it should fall back to a COMPLETE. If I specify nothing, it should create FAST if it can, and if not fall back to COMPLETE without notice. When creating a materialized view, I believe warnings should be issued when a column’s source cannot be inferred by the parser, but at the risk of being controversial, I think I should still be able to create the view anyway. I’ve always been partial to systems that allow you to shoot yourself in the foot. I could see doing something like: CREATE MATERIALIZED VIEW Foo AS SELECT ID, Name FROM Users; Postgres knows that ID is a primary key, and can thus update the view when Users changes. Had I not put in a primary key, I think one should be generated for me automatically based on every primary key in the referred tables. If tables do not have primary keys, you’d get a warning that the view has to be re-created on any change to the referred keyless tables. CREATE MATERIALIZED VIEW Foo AS SELECT State, AVG(Age) FROM Users GROUP BY State; Ok this is a tough one to figure out. Since the Age column is referred to, we could force a refresh every time Age in any row changes. In theory, the engine could be smart enough to realize the aggregate age came from a group, and thus update the appropriate “State” row when any Age within that state changed. Wow, this is getting tough; I see why Oracle just said no way on aggregate functions. CREATE MATERIALIZED VIEW Foo AS SELECT ID, GetStateFromZip(ZipCode) as State FROM Users; Ouch, a function. Well, the engine could look at the volatility of my GetStateFromZip function and make an assumption that passing in value X would always result in Y for nonvolatile functions. Then, update row ID when ZipCode changes. However, if the function is volatile or the data that the function itself uses changes, then we’d definitely run into issues. Two options: One, you’d issue a NOTICE and say something like “The source of column ‘State’ cannot be inferred.” and create the view anyway. Maybe there could be some STRICT option or pragma to simply not allow creating the dangerous views that could get stale. Another option, the user could provide the insight the engine needs to create the view. CREATE MATERIALIZED VIEW Foo REFRESH ON (ZipCodes Z WHERE Z.Zip = Users.Zip) AS SELECT ID, GetStateFromZip (ZipCode) as State FROM Users; Now we’ve told the engine that when any column in the ZipCodes table changes (you could probably qualify this further), then to update the materialized view of any Users row with that zipcode. Ok, this isn’t 100% ironed out but the idea is that I can provide instructions on how to handle situations that cannot be inferred automatically. I really like the idea of having the database do the best it can with the data it has, and issue NOTICEs if your view won’t be perfect. Then, allowing me to provide hints as to when and how to update the pieces of data that cannot be automatically inferred if I want to get around those warnings. To me, it seems more polite than insisting the database knows all and when it doesn’t, denying the very existence of the view completely. One could argue that if you’re going this far, you’ve basically turned the whole trigger mechanism inside out and one could already implement this whole thing on Postgres 9 using real tables and triggers when the appropriate data changes. This is something I’m struggling with as well. Materialized views seem to be great for quick database snapshots and very simple selects, but anything too complicated and it’s either not an option, or you’d have to explain so much logic that you’re better off using a series of triggers and writing a custom solution anyway. This makes me really question the fundamental use of materialized views. In other words, what is a real solid scenario that you would use one for that simply cannot be done currently using triggers? Or, is it simply the “ease of creation” people are after? There must be things I’m just not seeing, as this is the number one feature request on postgresql.uservoice.com by a massive landslide, and I fail to believe all those people are just “lazy.” -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general