On 2020-01-31 10:13:20 +0000, Sandip Pradhan wrote: > Thanks for your time. > > Out Postgresql version: 9.5.9.14. > > We are using COPY command to insert rows into the tables. While running COPY command, all indexes dropped. > So I take it that the copy is fast, but the building of indexes is slow (actually I'm not sure whether less than 50 seconds per index is slow for tables of this size, but it's too slow for your application) > After successfully inserting all the rows, we are trying to create all those indexes. > > Example of index creation script: > CREATE INDEX cwt_sr_assocact_entityref > ON cwt_sr_associationact > USING btree > (entityapplicationcontext COLLATE pg_catalog."default", entitytype COLLATE pg_catalog."default", entitydn COLLATE pg_catalog."default"); > > CREATE INDEX ix_sr_assoc_customerrelateddn > ON cwt_sr_associationact > USING btree > (customerrelateddn COLLATE pg_catalog."default"); > > Running indexes in parallel. > > Please find the hardware detail of the server: > OS: Red Hat Enterprise Linux 7.4 > RAM: 125 GB > CPU Core: 36 > > Set maintenance_work_mem to 25GB Ideally, you would want to use all the cores and each process should use as much RAM as possible. However, if each process uses 25 GB, you can only fit 4 of them (not 5 because other stuff needs memory, too), in RAM at the same time, leaving 32 of your 36 cores idle. OTOH, for 36 processes, each would have to be restricted to about 3 GB, which doesn't sound ideal, either. So there's probably a sweet spot somewhere in the middle. What I would do: To save time, I would test only with a single table (there are 29 indexes on each table and that's almost certainly more than the optimal degree of parallelism, so we won't get anything from processing several tables at once). To establish a baseline, I'd look for the optimal parallelism with your current parameters: Build the indexes with 1, 2, 4, 8, 16, 29 processes in parallel. Stop if it gets noticably slower, especially if it starts to swap heavily. Then bisect between the two best ones. Note the results. Then reduce maintenance_work_mem to 50 % and repeat the experiment. Does it get better? If it gets better (or at least not much worse), reduce maintenance_work_mem again and repeat. Eventually you should find the optimal combination. To reach your goal of 40 minutes for all 10 tables, your optimal run for a single table must be about 4 minutes. If you don't get close to that you will probably have to look for different solutions. As already mentioned, the speed of I/O makes a lot of difference. If you don't use SSDs yet, you should. If you do use SSDs, maybe you can get faster ones? You might also investigate putting pgsql_tmp on a RAM disk. You could also try changing synchronous_commit and/or fsync/full_page_writes (the latter only if you can tolerate losing your DB in a crash). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature