On 10/20/07, Rajarshi Guha <rguha@xxxxxxxxxxx> wrote: > Hi, relating to my previous queries on doing spatial searches on 10M > rows, it seems that most of my queries return within 2 minutes. > Generally this is not too bad, though faster is always better. > > Interestingly, it appears that the CUBE index for the table in > question is about 3GB (the table itself is about 14GB). Not knowing > the details of the postgres internals, I assume that when a query > tries to use the index, it will need to read a 3GB file. Is this a > correct assumption? Not sure. I assume not, but will wait for someone more conversant with pgsql index usage to post that answer. > In such a situation, is there a way to keep the index in memory? My > machine has 8GB installed and currently has about 7.4GB free RAM (64 > bit linux 2.6.9) The kernel will tend to keep it in memory. Usually it does a pretty good job of that. > A side effect of the size of the index is that if I do a query that > performs a seq scan (say using cube_distance) it takes longer than > when an index is used, but not significantly longer. And this is on a > 10M row table. Depending on how much of the table you're getting back, eventually a seq scan will outperform an index scan, because in pgsql-land, you ALWAYS have to hit the table whether there's an index entry or not, for visibility reasons. So, at some point, a certain percentage of the table being retrieved (1 to 25%) will trigger a sequential scan, and rightly so. > What strategies do people follow when the index becomes very big? You can only fit so much data into memory. Once a db gets big enough that it simply can't be stuffed into memory, you need a fast storage subsystem. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend