Search Postgresql Archives

Re: strange sql behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



One thing to look at is the last time both tables were vacuumed/analyzed.

SELECT n.nspname,
       s.relname,
       c.reltuples::bigint,
       n_tup_ins,
       n_tup_upd,
       n_tup_del,
       date_trunc('second', last_vacuum) as last_vacuum,
       date_trunc('second', last_autovacuum) as last_autovacuum,
       date_trunc('second', last_analyze) as last_analyze,
       date_trunc('second', last_autoanalyze) as last_autoanalyze
       ,
       round( current_setting('autovacuum_vacuum_threshold')::integer + current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples) AS av_threshold
  FROM pg_stat_all_tables s
  JOIN pg_class c ON c.oid = s.relid
  JOIN pg_namespace n ON (n.oid = c.relnamespace)
 WHERE s.relname NOT LIKE 'pg_%'
   AND s.relname NOT LIKE 'sql_%'
   AND s.relname IN  ' "your_small_table", "your_large_table"'
 ORDER by 1, 2;


Also, please confirm the indexes for both tables are using the same method (btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@xxxxxxxxx> wrote:
Hi there,

Recently I am working with a large amount of taxis GIS data and had encountered some weird performance issues.  I am hoping someone in this community can help me figure it out.

The taxi data were loaded in 5 minute block into a table.  I have two separate such tables, one stores a month of data with about 700 million rows, another stores about 10 days of data with about 300 million rows.  The two tables have the exactly same schema and indexes. There are two indexes: one on taxiid (text), and the other on the time stamp (date time).  In order to process the data, I need to get all points for a single taxis; to do that, I use something like:
 select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster for the large table than for the small table, which seems to contradict with intuition.   At the end of message you may find explain (analyze buffer) results of two particular queries for the same taxiid (one for each table). You can see that it took much longer (more than 20 times) to get 20k rows from the small table than to get 44 k rows from the large table.   Interestingly it seems that the planner does expect about 1/3 work for the small table query - yet for some reason, it took much longer to fetch the rows from the small table.   Why there is such a huge performance between the two seemingly identical queries executed on two different tables?

Is is because the data on the second table is on some mysteriously "broken part" of the disk?  what else could explain such a bizarre behavior?  Your help is greatly appreciated.

The above behavior is consistent through all queries.   Another issue I identified is that for the large table, the query can use the shared buffer more effectively.  For example, after you query one taxiid and immediately following that query run the same query for another taxi whose id ranks right behind the first id, then shared hit buffers would be quite high (and the query would run much faster); this however never works for the small table. 

Thanks   a lot!

Best, Marco


Results for the small table: it took 141 seconds to finish.  The planning time is 85256.31

"Sort  (cost=85201.05..85256.31 rows=22101 width=55) (actual time=141419.499..141420.025 rows=20288 loops=1)"
"  Sort Key: "time""
"  Sort Method: quicksort  Memory: 3622kB"
"  Buffers: shared hit=92 read=19816"
"  ->  Bitmap Heap Scan on data2013_01w  (cost=515.86..83606.27 rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
"        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
"        Heap Blocks: exact=19826"
"        Buffers: shared hit=92 read=19816"
"        ->  Bitmap Index Scan on data2013_01w_ixtaxiid  (cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053 rows=20288 loops=1)"
"              Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
"              Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish.  The planning time is 252077.10
"Sort  (cost=251913.32..252077.10 rows=65512 width=55) (actual time=5038.571..5039.765 rows=44204 loops=1)"
"  Sort Key: "time""
"  Sort Method: quicksort  Memory: 7753kB"
"  Buffers: shared hit=2 read=7543"
"  ->  Bitmap Heap Scan on data2011_01  (cost=1520.29..246672.53 rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
"        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
"        Heap Blocks: exact=7372"
"        Buffers: shared hit=2 read=7543"
"        ->  Bitmap Index Scan on data2011_01_ixtaxiid  (cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792 rows=44204 loops=1)"
"              Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
"              Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"



--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux