Hi there,
Recently
I am working with a large amount of taxis GIS data and had encountered
some weird performance issues. I am hoping someone in this community
can help me figure it out.
The
taxi data were loaded in 5 minute block into a table. I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for a
single taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What
puzzled me greatly is that this query runs consistently much faster for
the large table than for the small table, which seems to contradict
with intuition. At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for
each table). You can see that it took much longer (more than 20 times)
to get 20k rows from the small table than to get 44 k rows from the
large table. Interestingly it seems that the planner does expect about
1/3 work for the small table query - yet for some reason, it took much
longer to fetch the rows from the small table. Why there is such a
huge performance between the two seemingly identical queries executed on
two different tables?
Is
is because the data on the second table is on some mysteriously "broken
part" of the disk? what else could explain such a bizarre behavior?
Your help is greatly appreciated.
The
above behavior is consistent through all queries. Another issue I
identified is that for the large table, the query can use the shared
buffer more effectively. For example, after you query one taxiid and
immediately following that query run the same query for another taxi
whose id ranks right behind the first id, then shared hit buffers would
be quite high (and the query would run much faster); this however never
works for the small table.
Thanks a lot!
Best, Marco
Results for the small table: it took 141 seconds to finish. The planning time is 85256.31
"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27 rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Index Scan on data2013_01w_ixtaxiid (cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053 rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27 rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Index Scan on data2013_01w_ixtaxiid (cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053 rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53 rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid (cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792 rows=44204 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"