Re: Why so slow?

"Jim C. Nasby" <jnasby@xxxxxxxxxxxxx> · Fri, 28 Apr 2006 09:58:43 -0500

On Fri, Apr 28, 2006 at 11:41:06AM +0000, Bealach-na Bo wrote:
> OK, here is a much more detailed output. I still don't quite
> understand why simple queries like counting the number of rows in a
> table should take minutes to complete. Surely, any performance
> enhancement to be had by vacuuming is closely related to indexes
> which, in turn, are closely related to sorting and searching. A simple
> count of 365590 does not involve indexes (or does it??) and should not take 
> minutes. Should I be forcing the
> way postgresql plans my queries?
> 
> Here is my first attempt at vacuum that got nowhere and I had to
> cancel it.
> 
> ----------psql session start----------
> vacuum verbose analyze job_log;
> INFO:  vacuuming "job_log"
> INFO:  index "job_log_id_pkey" now contains 10496152 row versions in 59665 
> pages
> DETAIL:  0 index row versions were removed.
> 28520 index pages have been deleted, 20000 are currently reusable.
> CPU 1.44s/3.49u sec elapsed 33.71 sec.
> INFO:  index "idx_job_stop_filter" now contains 10496152 row versions in 
> 71149 pages
> DETAIL:  0 index row versions were removed.
> 24990 index pages have been deleted, 20000 are currently reusable.
> CPU 2.11s/3.61u sec elapsed 115.69 sec.
> INFO:  index "idx_job_start_filter" now contains 10496152 row versions in 
> 57891 pages
> DETAIL:  0 index row versions were removed.
> 19769 index pages have been deleted, 19769 are currently reusable.
> CPU 1.58s/3.44u sec elapsed 23.11 sec.
> Cancel request sent
> ----------psql session finish----------
> 
> 
> I thought that combining indexes would improve things and dropped the
> 3 separate ones above and created this one
> 
> ----------psql session start----------
> create index idx_job_log_filter on job_log(job_name,job_start,job_stop);
> 
> select count(*) from job_log;
> count
> --------
> 365590
> (1 row)

The above shows that the indexes contained 10M rows and 160M of dead
space each. That means you weren't vacuuming nearly enough.

> explain analyse select count(*) from job_log;
>                                                           QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=1382171.88..1382171.88 rows=1 width=0) (actual 
> time=207011.882..207011.883 rows=1 loops=1)
>   ->  Seq Scan on job_log  (cost=0.00..1381257.90 rows=365590 width=0) 
> (actual time=199879.510..206708.523 rows=365590 loops=1)
> Total runtime: 207014.363 ms
> (3 rows)
> ----------psql session finish----------
> 
> Then I tried another vacuum and decided to be very patient
> 
> ----------psql session start----------
> vacuum verbose analyze job_log;
> INFO:  vacuuming "job_log"
> INFO:  index "job_log_id_pkey" now contains 10496152 row versions in 59665 
> pages
> DETAIL:  0 index row versions were removed.
> 28520 index pages have been deleted, 20000 are currently reusable.
> CPU 1.39s/3.39u sec elapsed 24.19 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396 
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.59s/0.20u sec elapsed 10.28 sec.
> INFO:  "job_log": removed 2795915 row versions in 368091 pages
> DETAIL:  CPU 33.30s/30.11u sec elapsed 2736.54 sec.
> INFO:  index "job_log_id_pkey" now contains 7700230 row versions in 59665 
> pages
> DETAIL:  2795922 index row versions were removed.
> 37786 index pages have been deleted, 20000 are currently reusable.
> CPU 2.76s/6.45u sec elapsed 152.14 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396 
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.52s/0.20u sec elapsed 7.75 sec.
> INFO:  "job_log": removed 2795922 row versions in 220706 pages
> DETAIL:  CPU 19.81s/17.92u sec elapsed 1615.95 sec.
> INFO:  index "job_log_id_pkey" now contains 4904317 row versions in 59665 
> pages
> DETAIL:  2795913 index row versions were removed.
> 45807 index pages have been deleted, 20000 are currently reusable.
> CPU 2.22s/5.30u sec elapsed 129.02 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396 
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.50s/0.22u sec elapsed 7.61 sec.
> INFO:  "job_log": removed 2795913 row versions in 188139 pages
> DETAIL:  CPU 17.03s/15.37u sec elapsed 1369.45 sec.
> INFO:  index "job_log_id_pkey" now contains 2108405 row versions in 59665 
> pages
> DETAIL:  2795912 index row versions were removed.
> 53672 index pages have been deleted, 20000 are currently reusable.
> CPU 2.13s/4.57u sec elapsed 122.74 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396 
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.53s/0.23u sec elapsed 8.24 sec.
> INFO:  "job_log": removed 2795912 row versions in 187724 pages
> DETAIL:  CPU 16.84s/15.22u sec elapsed 1367.50 sec.
> INFO:  index "job_log_id_pkey" now contains 365590 row versions in 59665 
> pages
> DETAIL:  1742815 index row versions were removed.
> 57540 index pages have been deleted, 20000 are currently reusable.
> CPU 1.38s/2.85u sec elapsed 76.52 sec.
> INFO:  index "idx_job_log_filter" now contains 365590 row versions in 15396 
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.54s/0.31u sec elapsed 7.99 sec.
> INFO:  "job_log": removed 1742815 row versions in 143096 pages
> DETAIL:  CPU 12.77s/11.75u sec elapsed 1046.10 sec.
> INFO:  "job_log": found 12926477 removable, 365590 nonremovable row 
> versions in 1377602 pages

13M dead rows, and the table is 1.4M pages, or 11GB. No wonder it's
slow.

You need to run a vacuum full, and then you need to vacuum far more
often. If you're running 8.1, turn on autovacuum and cut each default
scale factor in half, to 0.2 and 0.1.

> DETAIL:  0 dead row versions cannot be removed yet.
> There were 7894754 unused item pointers.
> 0 pages are entirely empty.
> CPU 124.49s/117.57u sec elapsed 8888.80 sec.
> INFO:  vacuuming "pg_toast.pg_toast_17308"
> INFO:  index "pg_toast_17308_index" now contains 130 row versions in 12 
> pages
> DETAIL:  2543 index row versions were removed.
> 9 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.11 sec.
> INFO:  "pg_toast_17308": removed 2543 row versions in 617 pages
> DETAIL:  CPU 0.04s/0.05u sec elapsed 4.85 sec.
> INFO:  "pg_toast_17308": found 2543 removable, 130 nonremovable row 
> versions in 650 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.06s/0.06u sec elapsed 5.28 sec.
> INFO:  analyzing "rshuser.job_log"
> INFO:  "job_log": scanned 3000 of 1377602 pages, containing 695 live rows 
> and 0 dead rows; 695 rows in sample, 319144 estimated total rows
> VACUUM
> 
> 
> explain analyse select count(*) from job_log;
>                                                           QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=1382171.88..1382171.88 rows=1 width=0) (actual 
> time=207267.094..207267.095 rows=1 loops=1)
>   ->  Seq Scan on job_log  (cost=0.00..1381257.90 rows=365590 width=0) 
> (actual time=200156.539..206962.895 rows=365590 loops=1)
> Total runtime: 207267.153 ms
> (3 rows)
> 
> ----------psql session finish----------
> 
> 
> I also took snapshots of top output while I ran the above
> 
> 
> ----------top output start----------
> Cpu(s):  0.7% us,  0.7% sy,  0.0% ni, 49.7% id, 48.5% wa,  0.5% hi,  0.0% si
> Mem:   1554788k total,  1538268k used,    16520k free,     6220k buffers
> Swap:  1020024k total,      176k used,  1019848k free,  1404280k cached
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 3368 postgres  18   0 37492  29m  11m D  2.7  1.9   3:00.54 postmaster
> 
> 
> 
> Cpu(s):  0.7% us,  0.8% sy,  0.0% ni, 49.7% id, 48.5% wa,  0.3% hi,  0.0% si
> Mem:   1554788k total,  1538580k used,    16208k free,     2872k buffers
> Swap:  1020024k total,      176k used,  1019848k free,  1414908k cached
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 3368 postgres  15   0 37492  29m  11m D  2.3  1.9   5:26.03 postmaster
> 
> 
> Cpu(s):  0.5% us,  5.8% sy,  0.0% ni, 48.7% id, 44.4% wa,  0.5% hi,  0.0% si
> Mem:   1554788k total,  1538196k used,    16592k free,     1804k buffers
> Swap:  1020024k total,      176k used,  1019848k free,  1444576k cached
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 3368 postgres  15   0 20956  13m  11m D 11.0  0.9   6:25.10 postmaster
> ----------top output end----------
> 
> 
> I know my database needs a major redesign.  But I'm having a hard time
> explaining the poor performance nevertheless.
> 
> 
> Regards,
> 
> Bealach
> 
> 
> >From: Andreas Kretschmer <akretschmer@xxxxxxxxxxxxx>
> >To: pgsql-performance@xxxxxxxxxxxxxx
> >Subject: Re: [PERFORM] Why so slow?
> >Date: Thu, 27 Apr 2006 20:28:23 +0200
> >
> >Bealach-na Bo <bealach_na_bo@xxxxxxxxxxx> schrieb:
> >> The node table is tiny (2500 records).  What I'm pulling my hair out
> >> over is that ANY Query, even something as simple as select count(*)
> >> form job_log takes of the order of tens of minutes to complete. Just
> >> now I'm trying to run an explain analyze on the above query, but so
> >> far, it's taken 35min! with no result and there is a postgres process at
> >> the top of top
> >>
> >> What am I doing wrong??
> >
> >The 'explain analyse' don't return a result, but it returns the query
> >plan and importance details, how PG works.
> >
> >That's why you should paste the query and the 'explain analyse' -
> >output. This is very important.
> >
> >Anyway, do you periodical vacuum your DB? My guess: no, and that's why
> >you have many dead rows.
> >
> >20:26 < akretschmer|home> ??vacuum
> >20:26 < rtfm_please> For information about vacuum
> >20:26 < rtfm_please> see 
> >http://developer.postgresql.org/~wieck/vacuum_cost/
> >20:26 < rtfm_please> or 
> >http://www.postgresql.org/docs/current/static/sql-vacuum.html
> >20:26 < rtfm_please> or http://www.varlena.com/varlena/GeneralBits/116.php
> >
> >20:27 < akretschmer|home> ??explain
> >20:27 < rtfm_please> For information about explain
> >20:27 < rtfm_please> see 
> >http://techdocs.postgresql.org/oscon2005/robert.treat/OSCON_Explaining_Explain_Public.sxi
> >20:27 < rtfm_please> or http://www.gtsm.com/oscon2003/toc.html
> >20:27 < rtfm_please> or 
> >http://www.postgresql.org/docs/current/static/sql-explain.html
> >
> >
> >Read this links for more informations about vacuum and explain.
> >
> >
> >HTH, Andreas
> >--
> >Really, I'm not out to destroy Microsoft. That will just be a completely
> >unintentional side effect.                              (Linus Torvalds)
> >"If I was god, I would recompile penguin with --enable-fly."    (unknow)
> >Kaufbach, Saxony, Germany, Europe.              N 51.05082?, E 13.56889?
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 9: In versions below 8.0, the planner will ignore your desire to
> >       choose an index scan if your joining column's datatypes do not
> >       match
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>               http://archives.postgresql.org
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@xxxxxxxxxxxxx
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461