On 2024-02-12 11:28:41 -0500, Greg Sabino Mullane wrote: > On Mon, Feb 12, 2024 at 1:50 AM veem v <veema0000@xxxxxxxxx> wrote: > > So we were thinking, adding many column to a table should be fine in > postgres (as here we have a use case in which total number of columns may > go till ~500+). But then, considering the access of columns towards the > end of a row is going to add more time as compared to the column which is > at the start of the row. As you mentioned, accessing 100th column may add 4 > to 5 times more as compared to the access of first column. So , is it > advisable here to go for similar approach of breaking the table into two , > if the total number of column reaches certain number/threshold for a table? > > > I'm not sure of what Peter was testing exactly to get those 4-5x figures, Sorry, I should have included my benchmark code (it's short enough - see below). What i was actually timing was select count(*) from t_postgresql_column_bench where v{i} = 'a' for various i. > but I presume that is column access time, That was the goal. Of course there is always some overhead but I figured that by counting rows where a column has a constant value the overhead is minimal or at least constant. > which would not mean a direct effect on your total query time of 4-5x. Right. In any real application the column access time is only a part of the total processing time and probably a small part, so the effect on total processing time is correspondingly smaller. hp ------------------------------------------------------------------------ #!/usr/bin/python3 import random import time import psycopg2 n_cols = 100 n_rows = 100000 db = psycopg2.connect("") csr = db.cursor() csr.execute("drop table if exists t_postgresql_column_bench") q = "create table t_postgresql_column_bench (" q += ", ".join(f"v{i} text" for i in range(n_cols)) q += ")" csr.execute(q) q = "insert into t_postgresql_column_bench values(" q += ", ".join("%s" for i in range(n_cols)) q += ")" for j in range(n_rows): v = [ chr(random.randint(96+1, 96+26)) for i in range(n_cols)] csr.execute(q, v) db.commit() for i in range(n_cols): q = f"select count(*) from t_postgresql_column_bench where v{i} = 'a'" t0 = time.clock_gettime(time.CLOCK_MONOTONIC) csr.execute(q) r = csr.fetchall() print(r) t1 = time.clock_gettime(time.CLOCK_MONOTONIC) print(i, t1 - t0) db.commit() ------------------------------------------------------------------------ -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature