Re: Need support on tuning at the time of index creation

"Peter J. Holzer" <hjp-pgsql@xxxxxx> · Sat, 1 Feb 2020 11:21:33 +0100

On 2020-01-31 10:13:20 +0000, Sandip Pradhan wrote:
> Thanks for your time.
> 
> Out Postgresql version: 9.5.9.14.
> 
> We are using COPY command to insert rows into the tables. While running COPY command, all indexes dropped.
> 

So I take it that the copy is fast, but the building of indexes is slow
(actually I'm not sure whether less than 50 seconds per index is slow
for tables of this size, but it's too slow for your application)

> After successfully inserting all the rows, we are trying to create all those indexes.
> 
> Example of index creation script:
> CREATE INDEX cwt_sr_assocact_entityref
>   ON cwt_sr_associationact
>   USING btree
>   (entityapplicationcontext COLLATE pg_catalog."default", entitytype COLLATE pg_catalog."default", entitydn COLLATE pg_catalog."default");
> 
> CREATE INDEX ix_sr_assoc_customerrelateddn
>   ON cwt_sr_associationact
>   USING btree
>   (customerrelateddn COLLATE pg_catalog."default");
> 
> Running indexes in parallel.
> 
> Please find the hardware detail of the server:
> OS: Red Hat Enterprise Linux 7.4
> RAM: 125 GB
> CPU Core: 36
> 
> Set maintenance_work_mem to 25GB

Ideally, you would want to use all the cores and each process should use
as much RAM as possible. However, if each process uses 25 GB, you can
only fit 4 of them (not 5 because other stuff needs memory, too), in RAM
at the same time, leaving 32 of your 36 cores idle. OTOH, for 36
processes, each would have to be restricted to about 3 GB, which doesn't
sound ideal, either. So there's probably a sweet spot somewhere in the
middle.

What I would do:

To save time, I would test only with a single table (there are 29
indexes on each table and that's almost certainly more than the optimal
degree of parallelism, so we won't get anything from processing several
tables at once).

To establish a baseline, I'd look for the optimal parallelism with your
current parameters: Build the indexes with 1, 2, 4, 8, 16, 29 processes in
parallel. Stop if it gets noticably slower, especially if it starts to
swap heavily. Then bisect between the two best ones. Note the results.

Then reduce maintenance_work_mem to 50 % and repeat the experiment.
Does it get better?

If it gets better (or at least not much worse), reduce
maintenance_work_mem again and repeat.

Eventually you should find the optimal combination. 

To reach your goal of 40 minutes for all 10 tables, your optimal run for
a single table must be about 4 minutes. If you don't get close to that
you will probably have to look for different solutions.

As already mentioned, the speed of I/O makes a lot of difference. If you
don't use SSDs yet, you should. If you do use SSDs, maybe you can get
faster ones? You might also investigate putting pgsql_tmp on a RAM disk.

You could also try changing synchronous_commit and/or
fsync/full_page_writes (the latter only if you can tolerate losing your
DB in a crash).

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@xxxxxx         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
Attachment:
signature.asc

Description: PGP signature