Search Postgresql Archives

Re: How batch processing works

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/21/24 07:36, Peter J. Holzer wrote:
On 2024-09-21 16:44:08 +0530, Lok P wrote:

---------------------------------------------------------------------------------------------------
#!/usr/bin/python3

import time
import psycopg2

num_inserts = 10_000
batch_size = 50

db = psycopg2.connect()
csr = db.cursor()

csr.execute("drop table if exists parent_table")
csr.execute("create table parent_table (id int primary key, t text)")
db.commit()

start_time = time.monotonic()
for i in range(1, num_inserts+1):
     csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
     if i % batch_size == 0:
         db.commit()
db.commit()
end_time = time.monotonic()
elapsed_time = end_time - start_time
print(f"Method 2: Individual Inserts with Commit after {batch_size}  Rows: {elapsed_time:.3} seconds")

# vim: tw=99
---------------------------------------------------------------------------------------------------

FYI, this is less of problem with psycopg(3) and pipeline mode:

import time
import psycopg

num_inserts = 10_000
batch_size = 50

db = psycopg.connect("dbname=test user=postgres host=104.237.158.68")
csr = db.cursor()

csr.execute("drop table if exists parent_table")
csr.execute("create table parent_table (id int primary key, t text)")
db.commit()

start_time = time.monotonic()
with db.pipeline():
    for i in range(1, num_inserts+1):
        csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
        if i % batch_size == 0:
            db.commit()
db.commit()
end_time = time.monotonic()
elapsed_time = end_time - start_time
print(f"Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after {batch_size} Rows: {elapsed_time:.3} seconds")


For remote to a database in another state that took the  time from:

Method 2: Individual Inserts with Commit after 50  Rows: 2.42e+02 seconds

to:

Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after 50 Rows: 9.83 seconds

#!/usr/bin/python3

import itertools
import time
import psycopg2

num_inserts = 10_000
batch_size = 50

db = psycopg2.connect()
csr = db.cursor()

csr.execute("drop table if exists parent_table")
csr.execute("create table parent_table (id int primary key, t text)")
db.commit()

start_time = time.monotonic()
batch = []
for i in range(1, num_inserts+1):
     batch.append((i, 'a'))
     if i % batch_size == 0:
         q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch))
         params = list(itertools.chain.from_iterable(batch))
         csr.execute(q, params)
         db.commit()
         batch = []
if batch:
     q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch))
     csr.execute(q, list(itertools.chain(batch)))
     db.commit()
     batch = []

end_time = time.monotonic()
elapsed_time = end_time - start_time
print(f"Method 3: Batch Inserts ({batch_size})  with Commit after each batch: {elapsed_time:.3} seconds")

# vim: tw=99
---------------------------------------------------------------------------------------------------

The above can also be handled with execute_batch() and execute_values() from:

https://www.psycopg.org/docs/extras.html#fast-execution-helpers


On my laptop, method2 is about twice as fast as method3. But if I
connect to a database on the other side of the city, method2 is now more
than 16 times faster than method3 . Simply because the delay in
communication is now large compared to the time it takes to insert those
rows.

         hp


--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux