Building lots of indices in parallel

Ron <ronljohnsonjr@xxxxxxxxx> · Thu, 11 Feb 2021 21:31:40 -0600

We're migrating a big database (200ish tables 30 of which have bytea or xml 
fields; the Oracle size is 10TB) to Postgresql 12.5 (RDS, if it matters), 
and after the loads are completed, it will be time to create the secondary 
indices (tables currently only have PK constraints), and they of course need 
to be built as fast as possible.

The first things to do are based in Postgres itself:
ALTER TABLE foo SET (parallel_workers = 4);
SET max_parallel_maintenance_workers TO 4;
CREATE INDEX foo_ind2 ON foo (f1);

That creates one index very quickly, but only one index at a time, wasting 
the other CPUs (there are 48 in the VM, and 384GB RAM).

What is the impact on locking/contention of using something like GNU 
Parallel (especially when it tries to create multiple indices on the same 
table in parallel)?

Is there a better way to quickly create lots of indices?

--
Angular momentum makes the world go 'round.