Improve postgres 13.2 performance for concurrent bulk insert

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, 

What I'm trying to do:

  • I have 3rd party DB (any type Postgres, Oracle...) I have no control over it. Let's call it REMOTE_DB
  • I have my own Postgres instance 13.2. I fully control it. Let's call it MY_PG
  • REMOTE_DB has a 20-200M records table
  • I need from time to time to fetch 100K - 1M records from REMOTE_DB, insert into MY_PG and then do sophisticated joins to produce some analytical results. Let's call it REMOTE_DB_FETCH
  • REMOTE_DB_FETCH saves records to MY_PG
  • MY_PG has table that mimics schema of source table stored in REMOTE_DB. It mimics PK too.
  • there are concurrent processes that run REMOTE_DB_FETCH to insert data into MY_PG

Long story short:

  • I cache subset of REMOTE_DB at MY_PG
  • Do analytics in MY_PG

What I've tried to far

What else can I try in order to UPSERT from REMOTE_DB to MY_PG faster from concurrent REMOTE_DB_FETCH processes?

I've created isolated test to try out performance. Inserting around 48K rows into a table with 150 fields with only PK constraint on to fields: varchar and bigint.

Table is unlogged

isUnlogged: [{relpersistence=u, relname=cache_1694}, {relpersistence=u, relname=cache_1694_pkey}] 

sql

INSERT INTO "cache_1694" ("varchar_column", "bigint_column", "another_column" /* more column 150 in total */) VALUES (?, ?, ? /**/)  
ON CONFLICT ("varchar_column", "bigint_column") DO NOTHING;

It took 30 sec to insert 48819 records. looks slow, what can I try else? I'm running postgres on my Mac Pro with an SSD disk. Here are the settings:

postgres:
    image: postgres:13.2
    volumes:
      - ~/pgdata:/var/lib/postgresql/data
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_USER=pguser
      - POSTGRES_PASSWORD=pguser
      - POSTGRES_DB=pgdb
      - PGDATA=/var/lib/postgresql/data/pgdata

    command:
      - "postgres"
      - "-c"
      - "max_connections=50"
      - "-c"
      - "shared_buffers=1GB"
      - "-c"
      - "effective_cache_size=1GB"
      - "-c"
      - "synchronous_commit=off"
SELECT * FROM pg_settings where pg_settings.name like '%commi%'

shows

  {
    "name": "synchronous_commit",
    "setting": "off"
  }
]

Strange thing is that LOGGED table performance is not much greater than UNLOGGED. Same 30 sec for 48819 inserted duplicated rows



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux