I want to start by thanking you guys for a quick response and I will try to provide all the information you request.
1) What version am I running:
"PostgreSQL 9.1.2, compiled by Visual C++ build 1500, 64-bit"
2) Schema:
CREATE TABLE test( id integer, dtstamp timestamp without time zone, rating real) WITH ( OIDS=FALSE);
CREATE INDEX test_all ON test USING btree (id , dtstamp, rating);
200M rows
Table size 9833MB
Index size 7653 MB
CREATE INDEX test_all ON test USING btree (id , dtstamp, rating);
200M rows
Table size 9833MB
Index size 7653 MB
3) Difference between the first and the second run time?
The statement executed is:
SELECT count(1) FROM test
WHERE id in (58,83,88,98,124,141,170,195,
WHERE id in (58,83,88,98,124,141,170,195,
202,252,265,293,305,331,348)
AND dtstamp between cast('2011-10-19 08:00:00' as timestamp) and cast('2011-10-19 16:00:00' as timestamp)
AND dtstamp between cast('2011-10-19 08:00:00' as timestamp) and cast('2011-10-19 16:00:00' as timestamp)
a) 1st run = 26 seconds
b) 2nd run = 0.234 seconds
c) 3rd-6th run = 0.06 seconds
If I perform the query above for another day then I get 26 seconds for the 1st query.
b) 2nd run = 0.234 seconds
c) 3rd-6th run = 0.06 seconds
If I perform the query above for another day then I get 26 seconds for the 1st query.
4) What was the execution plan of it
"Aggregate (cost=151950.75..151950.76 rows=1 width=0)"
" -> Bitmap Heap Scan on data_cbm_reading cbm (cost=1503.69..151840.82 rows=43974 width=0)"
" Recheck Cond: ((virtual_id = ANY ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[])) AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
" -> Bitmap Index Scan on data_cbm_reading_all (cost=0.00..1492.70 rows=43974 width=0)"
" Index Cond: ((virtual_id = ANY ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[])) AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
" -> Bitmap Heap Scan on data_cbm_reading cbm (cost=1503.69..151840.82 rows=43974 width=0)"
" Recheck Cond: ((virtual_id = ANY ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[])) AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
" -> Bitmap Index Scan on data_cbm_reading_all (cost=0.00..1492.70 rows=43974 width=0)"
" Index Cond: ((virtual_id = ANY ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[])) AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
5) In this case, I shut down the mssql server/machine and restart it. To be on the safe side, I ensured the cache is empty using dbcc freeproccache and dbcc dropcleanbuffers.
Then I tried the same statement as above:
a) 1st run = 0.8 seconds
b) 2nd, 3rd, ... run = 0.04 seconds
c) change the select statement for any another other day and run it again give 1st run 0.5 seconds
d) 2nd, 3rd, ... run = 0.04 seconds
6) You wrote "I doubt covering indexes is going to make that query 23x faster."b) 2nd, 3rd, ... run = 0.04 seconds
c) change the select statement for any another other day and run it again give 1st run 0.5 seconds
d) 2nd, 3rd, ... run = 0.04 seconds
I decided to check out how mssql performs if it cannot use a covering index. In order to do that, I drop my current index and create it again on id, dtstamp. That forces mssql to look into the data file and the index is no longer sufficient.
Running the following statement force the "rating" columns to be accessed:
select sum(rating)
FROM test
WHERE id in (58,83,88,98,124,141,170,195,202,252,265,293,305,331,348)
AND dtstamp >= '2011-10-19 08:00:00' AND dtstamp <= '2011-10-19 16:00:00'
a) 1st run = 20 seconds
b) 2nd run = 0.6
c) 3rd, ... run = 0.3 seconds
As you can see the response time gets just as bad as in Postgres.FROM test
WHERE id in (58,83,88,98,124,141,170,195,202,252,265,293,305,331,348)
AND dtstamp >= '2011-10-19 08:00:00' AND dtstamp <= '2011-10-19 16:00:00'
a) 1st run = 20 seconds
b) 2nd run = 0.6
c) 3rd, ... run = 0.3 seconds
Now lets recreate the mssql index with all the columns and double check the response time:
a) 1st run = 2 seconds
b) 2nd run = 0.12
c) 3rd, ... run = 0.3 seconds
b) 2nd run = 0.12
c) 3rd, ... run = 0.3 seconds
Therefore, I must conclude that in the case of mssql the "covering" index is making a huge impact.
I have spent the whole day providing this data (takes a while to shuffle 200M rows) and tomorrow I will try your suggestion regarding two indexes.
Do you think I should try using the latest build of the source for 9.2 since index-only-scan is "ready" according to http://www.depesz.com/index.php/2011/10/08/waiting-for-9-2-index-only-scans/
?
Thanks,
- Gummi
On Wed, Feb 1, 2012 at 7:35 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
On Wed, Feb 1, 2012 at 12:50 PM, Gudmundur Johannessonwhat's the difference between the first and the second run time?
<gudmundur.johannesson@xxxxxxxxx> wrote:
> Here are the answers to your questions:
> 1) I change the select statement so I am refering to 1 day at a time. In
> that case the response time is similar. Basically, the data is not in cache
> when I do that and the response time is about 23 seconds.
Note, if you are only interested in the date the dtStamp falls on, you
can exploit that in the index to knock 4 bytes off your index entry:
(id , (dtstamp::date) , rating);
CREATE INDEX test_all
ON test
USING btree
and then use a similar _expression_ to query it back out.
I asked you to time a different query. Look again (and I'd like to
> 3) The query takes 23 sec vs 1 sec or lower in mssql.
see cached and uncached times).
I doubt covering indexes is going to make that query 23x faster.
> We never update/delete and therefore the data is alway correct in the index
> (never dirty). Therefore, Postgres could have used the data in it.
>
> I started to add columns into indexes in Oracle for approx 15 years ago and
> it was a brilliant discovery. This looks like a show stopper for me but I
However, I bet we can get something worked out.
merlin