The thought: - Load the big chunk of data into a new table - Generate some minimal set of indices on the new table - Generate four queries that compare old to new: q1 - See which tuples are unchanged from yesterday to today q2 - See which tuples have been deleted from yesterday to today q3 - See which tuples have been added q4 - See which tuples have been modified If the "unchanged" set is extremely large, then you might see benefit to doing updates based on deleting the rows indicated by q2, inserting rows based on q3, and updating based on q4. In principle, computing and applying those 4 queries might be quicker than rebuilding from scratch. In principle, applying q2, then q4, then vacuuming, then q3, ought to be "optimal."
This looks like an interesting idea, and I'm going to take a look at how feasible it'll be to impletement. I may be able to combine this with Mr. Wagner's idea to make a much more efficient system overall. It's going to be a pretty big programming task, but I've a feeling this summarizer thing may just need to be re-written with a smarter system like this to get something faster.
Thanks! Steve