Re: How to do faster DML

"Peter J. Holzer" <hjp-pgsql@xxxxxx> · Tue, 13 Feb 2024 15:47:04 +0100

On 2024-02-12 11:28:41 -0500, Greg Sabino Mullane wrote:
> On Mon, Feb 12, 2024 at 1:50 AM veem v <veema0000@xxxxxxxxx> wrote:
> 
>     So we were thinking, adding many column to a table should be fine in
>     postgres (as here we have a use case in which total number of columns may
>     go till ~500+). But then,  considering the access of columns towards the
>     end of a row is going to add more time as compared to the column which is
>     at the start of the row. As you mentioned, accessing 100th column may add 4
>     to 5 times more as compared to the access of first column. So , is it
>     advisable here to go for similar approach of breaking the table into two ,
>     if the total number of column reaches certain number/threshold for a table?
> 
> 
> I'm not sure of what Peter was testing exactly to get those 4-5x figures,

Sorry, I should have included my benchmark code (it's short enough - see
below).

What i was actually timing was 

    select count(*) from t_postgresql_column_bench where v{i} = 'a'

for various i.

> but I presume that is column access time,

That was the goal. Of course there is always some overhead but I figured
that by counting rows where a column has a constant value the overhead
is minimal or at least constant.

> which would not mean a direct effect on your total query time of 4-5x.

Right. In any real application the column access time is only a part of
the total processing time and probably a small part, so the effect on
total processing time is correspondingly smaller.

        hp

------------------------------------------------------------------------
#!/usr/bin/python3

import random
import time
import psycopg2

n_cols = 100
n_rows = 100000

db = psycopg2.connect("")
csr = db.cursor()

csr.execute("drop table if exists t_postgresql_column_bench")
q = "create table t_postgresql_column_bench ("
q += ", ".join(f"v{i} text" for i in range(n_cols))
q += ")"
csr.execute(q)
q = "insert into t_postgresql_column_bench values("
q += ", ".join("%s" for i in range(n_cols))
q += ")"

for j in range(n_rows):
    v = [ chr(random.randint(96+1, 96+26)) for i in range(n_cols)]
    csr.execute(q, v)
db.commit()

for i in range(n_cols):
    q = f"select count(*) from t_postgresql_column_bench where v{i} = 'a'"
    t0 = time.clock_gettime(time.CLOCK_MONOTONIC)
    csr.execute(q)
    r = csr.fetchall()
    print(r)
    t1 = time.clock_gettime(time.CLOCK_MONOTONIC)
    print(i, t1 - t0)
db.commit()
------------------------------------------------------------------------

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@xxxxxx         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
Attachment:
signature.asc

Description: PGP signature