Alvaro Herrera wrote:
Carlos Moreno wrote:
That is: the first time I run the query, it has to go through the
disk; in the normal case it would have to read 100MB of data, but due
to bloating, it actually has to go through 2GB of data. Ok, but
then, it will load only 100MB (the ones that are not "uncollected
disk garbage") to memory. The next time that I run the query, the
server would only need to read 100MB from memory --- the result should
be instantaneous...
Wrong. If there is 2GB of data, 1900MB of which is dead tuples, those
pages would still have to be scanned for the count(*). The system does
not distinguish "pages which have no live tuples" from other pages, so
it has to load them all.
Yes, that part I understand --- I think I now know what the error is in
my logic. I was thinking as follows: We read 2GB of which 1900MB are
dead tuples. But then, once they're read, the system will only keep
in memory the 100MB that are valid tuples.
I'm now thinking that the problem with my logic is that the system does
not keep anything in memory (or not all tuples, in any case), since it
is only counting, so it does not *have to* keep them, and since the
total amount of reading from the disk exceeds the amount of physical
memory, then the valid tuples are "pushed out" of memory.
So, the second time I execute the query, it will still need to scan the
disk (in my mind, the way I was seeing it, the second time I execute
the "select count(*) from customer", the entire customer table would be
in memory from the previous time, and that's why I was thinking that
the bloating would not explain why the second time it is still slow).
Am I understanding it right?
Thanks for your patience!
Carlos
--
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org