A cool idea, but if I understand it correctly very specific and fussy. New DB’s are spawned on this model, and all the developers would have to be aware of this non-standard behaviour, and DBA”s would have to create these indexes every month, for every DB (as the log tables are created every month). There are 89 session_id values in the January log (log_2013_01) so this would quickly get out of control. But – like I said – an interesting idea for more specific challenges.
From: Marc Mamin [mailto:M.Mamin@xxxxxxxxxxxx]
Sent: February 21, 2013 2:41 PM
To: Jeff Janes; Carlo Stonebanks
Cc: pgsql-performance@xxxxxxxxxxxxxx
Subject: AW: Are bitmap index scans slow to start?
>Rebuilding the index might help, as it would put all the leaf pages holding values for session_id=27 adjacent to each other, so they would read from disk faster. But with a name like >"session_id", I don't know how long such clustering would last though.
>If I'm right about the index disk-read time, then switching to a plain index scan rather than a bitmap index scan would make no difference--either way the data has to come off the disk.
>>I'd prefer a
>>strategy that allowed fast performance the first time, rather than slow the
>>first time and extremely fast subsequently.Hello,
if the index is only used to locate rows for single session_id, you may consider split it in a set of partial indexes.
e.g.
create index i_0 on foo where session_id%4 =0;
create index i_1 on foo where session_id%4 =1;
create index i_2 on foo where session_id%4 =2;
create index i_3 on foo where session_id%4 =3;
(can be built in parallel using separate threads)
Then you will have to ensure that all your WHERE clauses also contain the index condition:
WHERE session_id = 27 AND session_id%4 =27%4
regards,
Marc Mamin
I can't really help, but I can make it more clear why postgres is choosing a _bitmap_ index scan rather than a regular index scan. With a regular index scan it pumps the index for the locations of the rows that it points to and loads those rows as it finds them. This works great if the rows in the index are sorta sorted - that way it isn't jumping around the table randomly. Random io is slow. In a bitmap index scan pg pumps the index and buffers the by shoving them in a big bitmap. Then, it walks the bitmap in order to produce in order io. PG makes the choice based on a measure of the index's correlation.
The problem comes down to you inserting the sessions concurrently with one another. My instinct would be to lower the FILLFACTOR on newly created indecies so they can keep their entries more in order. I'm not sure why I have that instinct but it feels right. Also, you might could try clustering newly created tables on session_id and setting the fillfactor down so rows with the same session id will stick together on disk.
Now that I look stuff up on the internet I'm not sure where I saw that pg tries to maintain a cluster using empty space from FILLFACTOR but I _think_ it does. I'm not sure what is going on with my google foo today.
Nik
On Fri, Feb 22, 2013 at 12:50 PM, Carlo Stonebanks <stonec.register@xxxxxxxxxxxx> wrote: