Re: Estimate maintenance_work_mem for CREATE INDEX

Alex Shulgin <alex.shulgin@xxxxxxxxx> · Tue, 19 Dec 2017 22:32:33 +0000

On Tue, Dec 19, 2017 at 3:15 PM Greg Stark <stark@xxxxxxx> wrote:
On 19 December 2017 at 10:00, Oleksandr Shulgin

<oleksandr.shulgin@xxxxxxxxxx> wrote:

> If there would be an option in the database itself to provide those

> estimation, we wouldn't even need to figure out estimation queries.

> "EXPLAIN CREATE INDEX" anyone?

You're not the first to propose something like that. I think an

EXPLAIN ALTER TABLE would also be very handy -- it's currently

impossible to tell without carefully reading the source code whether a

given DDL change will require a full table scan, a full table rewrite,

or just a quick meta data update (and even in that case what strength

lock will be required). I think there are other utility statements

that make interesting heuristic decisions that would be nice to be

able to have some visibility into -- CLUSTER comes to mind.

Yes, that would be pretty handy.

I'm not clear how you would determine how much memory is needed to

sort a table without actually doing the sort though. So that would be

more of an EXPLAIN ANALYZE wouldn't it?

My idea would be to use statistic.  So that EXPLAIN CREATE INDEX (or whatever the actual interface could be like) would benefit from up-to-date statistic produced by ANALYZE.

Based on the estimated number of rows in the table, average width of column(s) to index and taking into account the bookkeeping structures one should be able to arrive at a good guess for the amount of memory the backend would end up allocating (assuming it is available).

Having done that, as the first step, and using statistic again we could also infer (though, probably with less accuracy) memory requirements for building partial indexes.  Functional indexes would be harder to tackle, I would think this is only possible if the return type(s) of the function(s) has all fixed width.

I didn't look in the code, but I imagine the procedure to read -> sort -> spill to tapes, if needed -> merge sort the tapes is generic to all index types, so this shouldn't be a breaking change for any user-defined indexes (is this already a thing?).  OK, maybe it's only generic for B-Tree and BRIN, but not for GIN and GiST, to name a few.  Damn, I gotta look in the code at some point. ;-)

To let me fantasize a little more, what I would also love to see is the estimated on-disk size for the resulting index, before starting to create it.  This is obviously dependent on the actual index type and options, such as fill-factor, etc.

Cheers,
--
Alex