On Sun, Sep 11, 2005 at 11:00:02PM -0800, Poul Jensen wrote: > Tom Lane wrote: > > >No, tableoid is sort of a virtual column ... it doesn't exist on disk. > >When you query it you get a value fetched from the internal data > >structure representing the table. > > > > > So virtual columns are possible - THIS is a way to clear redundant data! > Is it possible for a user to create a virtual column? If not, this would > make a big improvement. > > What I really need are "partial virtual columns". I'm imagining an > alternative version of VACUUM ANALYZE that could do the following: > 1) Order the rows in the table so that for each column, identical values > are placed next to each other for as far as possible (the row order that > optimizes one column will probably not be optimal for other columns). > 2) For each column, identify the stretches that contain only one > distinct value. Save that value together with ID of start and end row > and delete stretch. > It is not obvious how to do a perfect optimization process in 1), at > least not to me - I'm sure a skilled mathematician would know exactly > how to do it. But here's a simple approach that would get us part of the > way: > 1.1) Grab the column w. most redundancy (fewest distinct values) and > sort it into groups according to the distinct values. > 1.2) For each of these groups, grab the column w. next most redundancy > and sort into groups according to the distinct values. > And so on. Stop whenever groups become so small that there's nothing to > gain. > Such an analysis would make it much less expensive to combine > same-schema tables, and having everything in the same table is really > convenient. It would obviously save a lot of storage space, but I > imagine it would enable more efficient queries too - having to check > just 3 values instead of the thousands (or even millions) they may > replace must give a considerable gain. > > 'What is the big benefit of not having ordered rows? I imagine it could > be a disadvantage for dynamic databases, but for a static database like > mine which won't be modified, except for maybe adding new data once a > year, I imagine an optimization including row ordering could be highly > beneficial. Oracle supports something akin to this. On an Index Organized Table you can tell it to pull part of the index key out of individual rows. IE: CREATE TABLE (a, b, c, d, e, f) PRIMARY KEY(a, b, c, d) INDEX ORGANIZED COMBINE(a, b) ; In this case, every time the combination of (a, b) changes, Oracle stores a special record of some kind that indicates the change, and it doesn't store a or b with each row. (Note that I'm guessing on syntax, it's been a while since I've used Oracle). > ############################## > > Jim C. Nasby wrote: > > >What you seem to be looking for is a form of partitioning. PostgreSQL > >doesn't currently support partitioning of this form, but there's work in > >progress to change that. > > > > > Any idea how far out in the future this is? Would it make the > optimization process described above (reply to Tom Lane) obsolete? Well, > maybe my ideas about an optimal solution just illustrate lack of > knowledge about SQL, but I'm hoping somebody can see what I'm trying to > suggest. Well, the firslt important step is slated for 8.1; effective partition elimination. You can search for more info, especially in the BizGres archives. Next step is creating actual partitioning syntax to make managing partitions easier. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq