Re: Disabling/Enabling index before bulk loading

Tim Cross <theophilusx@xxxxxxxxx> · Wed, 11 Jul 2018 08:31:56 +1000

Ravi Krishna <srkrishna@xxxxxxxxx> writes:

> We recently did a test on COPY and found that on large tables (47 million rows , 20GB of raw data) the 
> difference in COPY with 16 indexes and COPY without any index is 1:14. That is, COPY is 14 times slower 
> when data is ingested with all indexes as opposed to COPY first without index and then create all index.
>

This is very system dependent. On our system, when we tested a similar
approach, we found that the time saved through dropping the indexes
before copy was lost when rebuilding the indexes afterwards. In fact, it
ended up being slightly slower.

I suspect a lot depends on the number and types of indexes you
have. Your example had a lot more indexes than ours. We were loading 22
batches with 2.5M records per batch. While copy was significantly faster
than transaction based inserts (minutes vs hours), the differences between
indexes and no indexes was measured in minutes. We only had 3 or 4
indexes.   

> I googled for earlier posting on this and it looks like this has been asked before too.  
>
> This is what I am thinking to do:
>
> 1 - Extract index definition and save it as a SQL somewhere, either a file or a table.
> 2 - Drop all indexes.
> 3 - Ingest data via COPY
> 4 - Recreate all indexes saved in (1).
>
> Is there a generic sql or script or tool to accomplish (1).
>

We are loading data via Javascript using pg and pg-copy-streams modules. It is
pretty straight forward to drop the indexes and recreate them afterwards
via sql, so we didn't look for a tool as such.

As data is only inserted into this table and only by this process, we
also turned off autovacuum for this table, performing vacuum and analyze
manually after load. 

Tim

-- 
Tim Cross